Failed to initialize NVML: Unknown Error

Hello,
I’m pretty new at this so please have patience :)

We have a 3 node VMware cluster running VMware 6.0U1a. We have just installed a Nvidia Grid K1 in one of our hosts.

The host is an IBM 3850 X6 and we have installed the card in slot 4 which is a PCIe x16 slot belonging to CPU3

I have followed the deployment guide and have changed the BIOS settings accordingly:
[b]* Memory Mapped Config Base memory window - changed from auto to 2 GB - (I think it supposed to be below 4 GB)

  • 64-bit PCI Resource - changed from Enabled to Disabled[/b]

I have installed the Virtual GPU manager:
esxcli software vib list | grep -i nvidia
NVIDIA-vgx-VMware_ESXi_6.0_Host_Driver 346.42-1OEM.600.0.0.2159203 NVIDIA VMwareAccepted 2015-12-14

The module is loaded:
esxcfg-module -l | grep nvidia
nvidia 0 8420

When I run the nvidia-smi command:
nvidia-smi
Failed to initialize NVML: Unknown Error

Theres no output in the vmkernel.log:
cat /var/log/vmkernel.log | grep NVRM
[root@ESX-F-1:/var/log]

VMware doesn’t seem to be aware of the Nvidia card. It only finds the onboard graphics card:
lspci | grep -i display
0000:1b:00.0 Display controller: Matrox Electronics Systems Ltd. G200eR2

I have struggled with this issue quite some time now so I really hope you can help.

/Michael

Do you have the K1’s configured for PCI Passthrough in vSphere?

If you do, you need to undo that.

Hi Jason,
No It’s not configured for passthrough in VMware

"we have installed the card in slot 4 which is a PCIe x16 slot belonging to CPU3"

Is that CPU socket populated?

Hi Jason,

Yes the CPU socket is populated - but thanks

I’ve seen problems myself with ESXi host driver VIB file and the ESXi build version. The U1 build is 3029758 and this is reflected in the filename of the VIB. In the past we have installed older or RC GRID drivers and found the same issues you are having (xorg not starting, no nvidia-smi output, etc)

If you have established no hardware issues or compatibility problems then checking the driver version might be an option

I’ve seen problems myself with ESXi host driver VIB file and the ESXi build version. The U1 build is 3029758 and this is reflected in the filename of the VIB. In the past we have installed older or RC GRID drivers and found the same issues you are having (xorg not starting, no nvidia-smi output, etc)

If you have established no hardware issues or compatibility problems then checking the driver version might be an option