L40S unavailable when other GPUs are present on ESXi host

Slothstronaut · October 15, 2024, 4:10pm

Good afternoon,

Running into a strange issue with vGPU on ESXi. Current versions:
Hypervisor: ESXi 8.0.3 24280767
vGPU Drivers: 16.7
Server: Supermicro X11DPG-SN, 2x Xeon 8272CL
GPUs: L40S, A16 and P40

Problem:
I have 4 hosts with identical configuration: each with an A16 and a P40 GPU. Recently I added a L40S to each server, but this causes the driver to be unstable and no VMs are able to use any GPU resources. The A16 and P40 GPU show up as graphics devices normally, but the L40S appears with “0” VRAM. Trying to launch a VM on the P40 or A16 results in a “device is not available on the host” error. No L40S vGPU profiles are visible at all on the host.

If i remove the A16 and P40 (leaving the L40S by itself) everything works perfectly fine. If I switch the P40 and A16 to PCIe passthough, the L40S works fine. If either GPU is added back to the host, the L40S stops working.

The only error I am able to find is the vGPU driver restarting over and over with the following message:

2024-10-14T23:17:06.579Z In(182) vmkernel: cpu82:2099624)NVRM: GPU 0000:db:00.0: RmInitAdapter failed! (0x25:0x56:1468)
2024-10-14T23:17:06.579Z In(182) vmkernel: cpu82:2099624)NVRM: rm_init_adapter failed for device 1
2024-10-14T23:17:06.784Z In(182) vmkernel: cpu82:2099624)NVRM: GPU at 0000:db:00.0 has software scheduler ENABLED with policy BEST_EFFORT.

I have tried a few different ideas following some research here. The only thing that made a change was to set “NVreg_EnableGpuFirmware=0”. This allowed the GPU to be visible on the host without any errors and the P40, A16 both work fine. However, launching a VM using the L40S results in a strange issue where the host allocates the GPU to the VM, but the VM is unable to initialize the GPU and never loads the driver.

I have attached the bug report with everything left “at defaults” and would love some assistance to drill this issue down.
nvidia-bug-report.log (3.0 MB)

sschaber · October 28, 2024, 12:33pm

This is simply not working, nor supported. There was a technical change between Ampere and ADA which doesn’t allow these generation to run in parallel.

Slothstronaut · October 28, 2024, 1:10pm

Well that definitely explains what I am seeing. Do you have any documentation I can reference that outlines that? I’ll need to explain this and how to move forward.

Thank you for the reply!

sschaber · October 28, 2024, 1:14pm

Nothing I can share publicly. We changed from a software based RM to a hardware based RM (RISCV chip) on the GPU starting with ADA GPUs. Hope this helps to understand that now either software or hardware based RM is possible and not both.

Slothstronaut · October 28, 2024, 1:52pm

Gotcha, I will pass this on. Thank you again for taking the time to reply!

aiwitech · February 5, 2025, 7:10am

I have DL380 server and installed one L40S on it. esxi recognized it as L40S but when I want get information with nvidia-smi it says can’t communicate with GPU.
what is problem?

Topic		Replies	Views
Insufficient resources. One or more devices (pciPassthru0) required by VM xxx are not available on host yyy General Discussion	9	12482	November 3, 2021
Passthru and vGPU Problem with L40s More vGPU Forums	4	559	June 3, 2025
A40 with ESXi 7 General Discussion	5	2314	February 25, 2022
Is it possible to present multiple vGPU's to a single VM from a Tesla T4 card on ESXi 6.7? General Discussion	4	3517	July 9, 2020
Cannot install NVIDIA driver in ESXi VM with vGPU NVIDIA Virtual GPU Drivers	6	4896	September 26, 2019
VM start failed Nvidia A40 vGPU Could not initialize plugin '/usr/lib64/vmware/plugin/libnvidia-vgx.so' for vGPU 'nvidia_a40-1b' NVIDIA Virtual GPU Drivers	2	5728	March 4, 2022
Problem with A40, "No devices were found" and "rm_init_adapter failed" Linux hw	7	2590	February 26, 2025
460.32 / ESXi 6.7U3g / Horizon 8.1 NVIDIA Virtual GPU Drivers	5	1523	March 23, 2021
ESXi 6.5su3, NVIDIA P4 no PCI shared device NVIDIA Virtual GPU Drivers	0	2087	February 19, 2020
Unable to upgrade Drivers on Esxi NVIDIA Virtual GPU Drivers	1	1280	March 7, 2024

L40S unavailable when other GPUs are present on ESXi host

Related topics