A100 SUPPORT ON VSPHERE 6.7 or 7

Hi to all,
I’ve got a configuration problem in our laboratory that involves the NVIDIA A100 with ESXI 7u1.
I’ve followed vmware blog(https://blogs.vmware.com/apps/2020/09/vsphere-7-0-u1-with-multi-instance-gpus-mig-on-the-nvidia-a100-for-machine-learning-applications-part-1-introduction.html) and also the NVIDIA MIG deployment guide

on the vSphere 7.0 U1 host I’ve tried MIG-backed and time-sliced vGPU profiles. Neither of the two seems to work correctly.
After enabling MIG Feature on the GPU, creating a GPU Instance, a Compute Instance and applying the vGPU profile to a virtual machine, during the power on procedure VMware reports that error:
could not initialize plugin /usr/lib64/vmware/plugin/libnvidia-vgx.so for vGPU grid_a100-10cRunning nvidia-smi on the VMware host after the error shows the following message:unable to determine the device handel for gpu 0000:3b:00:0 gpu is lost. reboot the system to recover this gpuWe must reboot the host to make the GPU “operational” again.I’ve also done same test creating only the GPU Instance and not the Compute Instance.
The same applies to time sliced vGPU profiles (disabling MIG feature and using “normal” vGPU).I’ve used all the latest vGPU driver: 11.2 and 12.0, same result.
The VMware version is ESXi 7.0 U1 build 17325551.
the SRV-IO feature in the BIOS is enabled. Dell r740 host server is in use.
I’ve tried 2 different cards but the result is the same. Also tried the passtrought, but the card always go in unrecoverable state
Any one can help?

Do you have adequate cooling for the GPU?

A100 is simply not supported on vSphere 7 yet. Support should come with the next U2 release.

Hi Adam,

I am having the same issiu as you too, only with a HPE DL385 Gen10+.
I read the same post and ran into the same results using ESXi-7.0U1d-17551050.

Currently only RHEL 8.2/8.3 support MIG and KVM for the A100.
Fun fact, it dosen’t! Even fresh out of the box RHEL with KVM will not start the “nvidia_vgpu_vfio“. The install of the driver works with no errors, but after a reboot with “lsmod | grep vfio” no “nvidia_vgpu_vfio“ an no way to access “nvidia-smi”

Maybe sschaber has an idea what to do with THEL or a info when vmware will release U2.

I am not able to install the vgpu drivers on vSpeher 7.0 U2. Can somebody confirm me that A100 80 GB will work on vSpehere 7.0 U2?