Background: Earlier this month I upgraded my production VMware vSphere environment from 6.7 U3 to 7.0.0. All the hosts in the cluster are HPE DL380 Gen10 server. I used HPE custom image to upgrade the ESXi hosts.
Issue/Error: After upgrading full VM backup failed. I was unable to take snapshot of any vm in the cluster. Verified, no issue with LUNs/Datastores and storage arrays. When I ssh into the host and tried df -h, and I got these errors:
VmFileSystem: Slow refresh failed; Unable to get FS Attrs for /vms/volumes/<datastore UUID>
Error when running esxcli, return status was: 1
Errors:
Error getting data for the filesystem on ‘/vmfs/volumes/<datastore-UUID>, skipping
Good thing is, I was able to access the all the datastores by providing the path /vmfs/volumes/datastorename
Resolution: When upgrading to major version using the HPE Custom image, first update the firmware on the host. I used the latest HPE SPP to update the firmware on the host which resolved the issue. I believe there were major drivers architecture changed made in ESXi 7.0
If you are using HPE custom image to upgrade, altbootbank partition on the host will be blank since HPE does not provide option to rollback to the previous version.
Issue resurfaced after a week.
As per VMware, it is a known issue. Will be fixed in future update
https://kb.vmware.com/s/article/80188
This is a known issue in VMFS6 where in certain work flows we allocate memory but do not free it up resulting in VMFS heap exhaustion.