Is there a good single source for Grid vGPU on ESXi installation and troubleshooting?

Hello,

We are new to Grid attempting to bring up a new VDI environment using dual M10 GPU’s in VxRail E570F (Dell 14G servers) nodes. We are able to install and upgrade the local VIB driver software on ESXi nodes but we cannot get it to start and run properly. From what I’ve seen in this forum and elsewhere there are a lot of variables that could contribute via BIOS, ESXi, vCenter settings etc. This is worriesome considering we plan to run our most demanding VDI clients on this platform.

Some of the error’s were running into with the latest driver VIB dated Oct 5, 2018

esxcli software vib list | grep NVIDIA
NVIDIA-VMware_ESXi_6.5_Host_Driver 390.94-1OEM.650.0.0.4598673 NVIDIA VMwareAccepted 2018-10-22

[root@dcapcvh01h:/etc/init.d] nvidia-smi
Failed to initialize NVML: Unknown Error

dmesg

2018-10-22T18:39:04.232Z cpu29:66331)DMA: 646: DMA Engine ‘NVIDIADmaEngine’ created using mapper ‘DMAIOMMU’.
NVRM: to your NVIDIA device is not supported by the kernel.
2018-10-22T18:39:04.273Z cpu29:66331)DMA: 691: DMA Engine ‘NVIDIADmaEngine’ destroyed.
2018-10-22T18:39:04.273Z cpu29:66331)DMA: 945: Protecting DMA engine ‘NVIDIADmaEngine’. Putting parent PCI device 0 000:3e:00.0 in IOMMU domain 0x43076165e7d0.

NVRM: to your NVIDIA device is not supported by the kernel.
2018-10-22T18:39:06.418Z cpu33:66331)DMA: 691: DMA Engine ‘NVIDIADmaEngine’ destroyed.
2018-10-22T18:42:25.496Z cpu35:70931)ALERT: NVIDIA: module load failed during VIB install/upgrade.
2018-10-22T18:42:25.502Z cpu12:70932)NVIDIA: Starting vGPU Services.
2018-10-22T18:42:25.510Z cpu47:70935)NVIDIA: Starting Xorg service.
2018-10-22T18:42:27.529Z cpu37:71538)NVIDIA: Starting the DCGM node engine.

Thanks for any help and best practices on these bad boys

Ron

http://sschaber.de/2017/11/20/unable-to-run-vgpu-manager-on-dell-r740-and-esx/