P40 and gpumodeswitch

The documentation seems to indicate that the P40 does not require a gpumodeswitch. However, after installing the NVIDIA GRID VIB, I see the following (dmesg):

2017-11-07T00:15:17.689Z cpu15:69668)NVRM: loading NVIDIA UNIX x86_64 Kernel Module 384.73 Mon Aug 21 15:16:25 PDT 2017
2017-11-07T00:15:17.689Z cpu15:69668)
2017-11-07T00:15:17.689Z cpu15:69668)Device: 191: Registered driver ‘nvidia’ from 91
2017-11-07T00:15:17.690Z cpu15:69668)Mod: 4968: Initialization of nvidia succeeded with module ID 91.
2017-11-07T00:15:17.690Z cpu15:69668)nvidia loaded successfully.
2017-11-07T00:15:17.691Z cpu13:66219)IOMMU: 2176: Device 0000:3b:00.0 placed in new domain 0x4304cc3e8af0.
2017-11-07T00:15:17.691Z cpu13:66219)DMA: 945: Protecting DMA engine ‘NVIDIADmaEngine’. Putting parent PCI device 0000:3b:00.0 in IOMMU domain 0x4304cc3e8af0.
2017-11-07T00:15:17.691Z cpu13:66219)DMA: 646: DMA Engine ‘NVIDIADmaEngine’ created using mapper ‘DMAIOMMU’.
2017-11-07T00:15:17.691Z cpu13:66219)NVRM: This is a 64-bit BAR mapped above 16 TB by the system
NVRM: BIOS or the VMware ESXi kernel. This PCI I/O region assigned
NVRM: to your NVIDIA device is not supported by the kernel.
NVRM: BAR1 is 32768M @ 0x3820$

This is with vSphere 6.5 Enterprise Plus. I am unable to install the gpumodeswitch VIB to even try it out…

Am I missing a step on the install?

Check your BIOS settings. You need to modify the IOMMU settings to support the big BAR size of 8GB…

Which hardware do you use? Check with the OEM.

Thank you for pointing me in the right direction. On a Dell R740xd server, I had to do the following in the BIOS -> Integrated Devices section:

SR-IOV Global Enable -> Enabled (Default was Disabled)
Memory Mapped I/O Base -> 512 GB (Default was 56 TB)

I appear to be on my way now, as nvidia-smi is returning values now.

Perfect. Thanks for sharing. I’m sure other customers will run into the same issue with this new hardware. I will try to get in contact with DELL to have this a default config for GPU enabled systems.



Thank you for this I had exactly the same problem with a Dell R740 and NVIDIA Telsa P40.

Thank you for sharing. I had the same problem and solved it with the above bios changes.
After changing:
SR-IOV Global Enable to Enabled
Memory Mapped I/O Base to 512 GB
The server recognizes the GPU card and the nvidia-smi command works fine.
But… When trying to power on a vm with shared pci device, I get the following error:
could not initialize plugin ‘/usr/lib64/vmware/plugin/libnvidia-vgx.so’ for vgpu ‘grid_p40-2q’

Has anyone of you encountered this issue?

You need to disable ECC on the P40 first

Yeh… you’re right, got it right now. Trying…
Thank you