Server: Dell R740 - 384GB RAM, Tesla M10
OS: ESXi 6.5U3
nVidia Driver: 460.73.02
GPU: nVidia Tesla M10
In the BIOS,
Memory Mapped I/O Above 4GB: Enabled
Memory Mapped I/O Base: 512GB
https://www.dell.com/support/kbdoc/en-ca/000144038/dell-poweredge-14g-esxi-returns-failed-to-initialize-nvml-unknown-error-with-nvidia-gpu
# esxcli system maintenanceMode set --enable true
# esxcli software vib install -d /path-to-zip/NVIDIA-bootbank-offline-bundle.zip
# esxcli system maintenanceMode set --enable false
# reboot
# nvidia-smi
Failed to initialize NVML: Unknown Error
https://kb.vmware.com/s/article/2064775
# esxcli hardware pci list –c 0x0300 –m 0xf
0000:40:00.0
Address: 0000:40:00.0
Segment: 0x0000
Bus: 0x40
Slot: 0x00
Function: 0x0
VMkernel Name: vmgfx3
Vendor Name: NVIDIA Corporation
Device Name: NVIDIATesla M10
Configured Owner: Unknown
Current Owner: VMkernel
Vendor ID: 0x10de
Device ID: 0x13bd
SubVendor ID: 0x10de
SubDevice ID: 0x1160
Device Class: 0x0300
Device Class Name: VGA compatible controller
Programming Interface: 0x00
Revision ID: 0xa2
Interrupt Line: 0xff
IRQ: 255
Interrupt Vector: 0x00
PCI Pin: 0x00
Spawned Bus: 0x00
Flags: 0x3201
Module ID: -1
Module Name: None
Chassis: 0
Physical Slot: 4294967295
Slot Description: PCIe Slot 1; relative bdf 04:00.0
Passthru Capable: true
Parent Device: PCI 0:60:17:0
Dependent Device: PCI 0:64:0:0
Reset Method: Bridge reset
FPT Sharable: true
As shown above, Module Name: None is not correct. It should be Module Name: nVidia
-
Does anyone know what could be causing the Failed to intialize NVML: Unknown Error?
-
On my ESXi server, does xorg need to be started? I still get this error whether xorg is started or not started.