Driver not Loading - VMWare ESXi 8 Update 1 + 15.2 vGPU Driver

I’m trying to set up my lab.

Platform: AMD Ryzen Threadripper PRO 5955WX
GPU: nVIDIA RTX a5000
OS: VMWare vSphere ESXi 8.0 Update 1

I signed up for an evaluation account and downloaded the drivers a month ago. I followed the guide to install the VIBs for the vGPU driver and the management daemon.

NVD_bootbank_NVD-VMware_ESXi_8.0.0_Driver_525.105.14-1OEM.800.1.0.20613240.vib  
NVD_bootbank_nvdgpumgmtdaemon_525.105.14-1OEM.700.1.0.15843807.vib

After installing both, I took the host out of maintenance mode, and restarted the host. I knew I was in trouble right away when errors showed up during boot. I checked, and found the following entries in the vmkwarninglog:

2023-06-03 Al(177) vmkalert: cpu9:2099677)ALERT: NVIDIA: module load failed during VIB install/upgrade.
2023-06-03 Al(177) vmkalert: cpu14:2100020)ALERT: NVIDIA: Device Groups generation failed.  

To double-check the error and test the install, I first ran

/etc/init.d/nvdGpuMgmtDaemon status  

and received the expected output

daemon_nvdGpuMgmtDaemon is running  

Then, I ran nvidia-smi… I get the error

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.  

I tried uninstalling the VIBs and re-installing them twice. Didn’t help. What should I do to troubleshoot?

Thanks.

2 Likes