I’m running into a problem getting a P40 running while assigned to an Ubuntu 22 LTS VM via direct passthrough. At boot the system log is spammed by the message in the title, repeating roughly every 300msec like this:
NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR1 is 0M @ 0x0 (PCI:0000:0b.0)
On this same host I have a Windows 11 VM which is able to passthrough the device as expected and everything works.
Host system details
DL380p Gen8, 384GB RAM installed
GPU enablement kit installed for 16x PCIe risers/power/etc
Tesla P40, 24GB VRAM
PCI Express 64-Bit BAR Support enabled in hidden BIOS menu
SR-IOV enabled in BIOS
Host is running ESXI 8.0 U1
System boots and runs off of internal graphics
Guest VM details
Installed guest via “ubuntu-22.04.2-live-server-amd64”, no desktop
64GB RAM assigned to VM and reserved
As this card is 16GB or greater, I’ve added the following Advanced Configuration settings to the ESXi VM
pciPassthru.64bitMMIOSizeGB = 64
pciPassthru.use64bitMMIO = TRUE
Tested with and without GRUB kernel command pci=realloc, no difference.
If anyone has any ideas as how I can make this work under a Linux VM I’d really appreciate it. Given that it works with a Windows VM I suspect that I must be relatively close but have beating my head against this for a week now and figured I’d see if anyone else had any ideas.
Resolved! This post made for a good rubber ducky :D
As noted above, I had deployed the Windows test system initially and it worked once provided with the couple Advanced Config options needed for large RAM cards. I did the same for Linux and it didn’t work, but I just figured out why: VMware defaults to UEIF boot for Win 10+ VMs, but defaults to BIOS boot when selecting Ubuntu Linux.
I re-deployed the Ubuntu machine with the two advanced settings and UEFI boot enabled, and now everything is working like it should.
$ nvidia-smi
Mon Jul 17 14:10:44 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.125.06 Driver Version: 525.125.06 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla P40 Off | 00000000:0B:00.0 Off | Off |
| N/A 40C P8 11W / 250W | 0MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+