I’m trying to set up my lab.
Platform: AMD Ryzen Threadripper PRO 5955WX
GPU: nVIDIA RTX a5000
OS: VMWare vSphere ESXi 8.0 Update 1
I signed up for an evaluation account and downloaded the drivers a month ago. I followed the guide to install the VIBs for the vGPU driver and the management daemon.
After installing both, I took the host out of maintenance mode, and restarted the host. I knew I was in trouble right away when errors showed up during boot. I checked, and found the following entries in the vmkwarninglog:
2023-06-03 Al(177) vmkalert: cpu9:2099677)ALERT: NVIDIA: module load failed during VIB install/upgrade. 2023-06-03 Al(177) vmkalert: cpu14:2100020)ALERT: NVIDIA: Device Groups generation failed.
To double-check the error and test the install, I first ran
and received the expected output
daemon_nvdGpuMgmtDaemon is running
Then, I ran nvidia-smi… I get the error
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
I tried uninstalling the VIBs and re-installing them twice. Didn’t help. What should I do to troubleshoot?