NVRM: This PCI I/O region assigned to your NVIDIA device is invalid

We are getting the following errors after the installation of the latest nvidia-open drivers when using a customized Ubuntu 22.04 image:

Apr 16 12:27:40 aut-ubuntu-test-v6 kernel: [ 3.064063] nvidia 0000:06:00.0: enabling device (0140 → 0142)
Apr 16 12:27:40 aut-ubuntu-test-v6 kernel: [ 3.068427] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
Apr 16 12:27:40 aut-ubuntu-test-v6 kernel: [ 3.068427] NVRM: BAR1 is 0M @ 0x0 (PCI:0000:06:00.0)
Apr 16 12:27:40 aut-ubuntu-test-v6 kernel: [ 3.070542] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
Apr 16 12:27:40 aut-ubuntu-test-v6 kernel: [ 3.070542] NVRM: BAR2 is 0M @ 0x0 (PCI:0000:06:00.0)
Apr 16 12:27:40 aut-ubuntu-test-v6 kernel: [ 3.072612] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
Apr 16 12:27:40 aut-ubuntu-test-v6 kernel: [ 3.072612] NVRM: BAR3 is 0M @ 0x0 (PCI:0000:06:00.0)
Apr 16 12:27:40 aut-ubuntu-test-v6 kernel: [ 3.074694] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
Apr 16 12:27:40 aut-ubuntu-test-v6 kernel: [ 3.074694] NVRM: BAR4 is 0M @ 0x0 (PCI:0000:06:00.0)
Apr 16 12:27:40 aut-ubuntu-test-v6 kernel: [ 3.076771] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
Apr 16 12:27:40 aut-ubuntu-test-v6 kernel: [ 3.076771] NVRM: BAR5 is 0M @ 0x0 (PCI:0000:06:00.0)
Apr 16 12:27:40 aut-ubuntu-test-v6 kernel: [ 3.125509] NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64 570.124.06 Release Build (dvs-builder@U22-I3-AE18-09-6) Wed Feb 26 01:52:55 UTC 2025
Apr 16 12:27:40 aut-ubuntu-test-v6 kernel: [ 3.150003] nvidia-modeset: Loading NVIDIA UNIX Open Kernel Mode Setting Driver for x86_64 570.124.06 Release Build (dvs-builder@U22-I3-AE18-09-6) Wed Feb 26 01:43:55 UTC 2025
Apr 16 12:27:40 aut-ubuntu-test-v6 kernel: [ 3.155628] [drm] [nvidia-drm] [GPU ID 0x00000600] Loading driver
Apr 16 12:27:40 aut-ubuntu-test-v6 kernel: [ 3.208241] ACPI Warning: _SB.PCI0.S14.S00._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20210730/nsarguments-61)
Apr 16 12:27:40 aut-ubuntu-test-v6 kernel: [ 4.253056] resource sanity check: requesting [mem 0x80700000-0x816fffff], which spans more than PCI Bus 0000:06 [mem 0x80000000-0x80ffffff]
Apr 16 12:27:40 aut-ubuntu-test-v6 kernel: [ 4.253061] caller os_map_kernel_space.part.0+0xb2/0xc0 [nvidia] mapping multiple BARs
Apr 16 12:27:40 aut-ubuntu-test-v6 kernel: [ 4.258016] NVRM: kbusVerifyBar2_GM107: MMUTest BAR0 window offset 0x70e000 returned garbage 0x0
Apr 16 12:27:40 aut-ubuntu-test-v6 kernel: [ 4.258028] NVRM: nvAssertOkFailedNoLog: Assertion failed: Generic memory error [NV_ERR_MEMORY_ERROR] (0x00000072) returned from kbusVerifyBar2_HAL(pGpu, pKernelBus, NULL, NULL, 0, 0) @ kern_bus_gm107.c:352
Apr 16 12:27:40 aut-ubuntu-test-v6 kernel: [ 4.258035] NVRM: nvAssertOkFailedNoLog: Assertion failed: Generic memory error [NV_ERR_MEMORY_ERROR] (0x00000072) returned from kbusStateInitLockedKernel_HAL(pGpu, pKernelBus) @ kern_bus_gm107.c:457
Apr 16 12:27:40 aut-ubuntu-test-v6 kernel: [ 4.258040] NVRM: RmInitNvDevice: *** Cannot initialize the device
Apr 16 12:27:40 aut-ubuntu-test-v6 kernel: [ 4.258042] NVRM: RmInitAdapter: RmInitNvDevice failed, bailing out of RmInitAdapter
Apr 16 12:27:40 aut-ubuntu-test-v6 kernel: [ 4.282338] NVRM: rmapiReportLeakedDevices: Device object leak: (0xc1e00002, 0xcaf00000). Please file a bug against RM-core.
Apr 16 12:27:40 aut-ubuntu-test-v6 kernel: [ 4.284816] NVRM: nvAssertFailedNoLog: Assertion failed: 0 @ rmapi.c:935
Apr 16 12:27:40 aut-ubuntu-test-v6 kernel: [ 4.290959] NVRM: nvAssertFailedNoLog: Assertion failed: listCount(&pKernelBus->virtualBar2[gfid].usedMapList) == 0 @ kern_bus_vbar2.c:346
Apr 16 12:27:40 aut-ubuntu-test-v6 kernel: [ 4.294873] NVOC: __nvoc_objDelete: Child class KernelVideoEngine not freed from parent class OBJGPU.NVRM: iovaspaceDestruct_IMPL: 3 left-over mappings in IOVAS 0x600
Apr 16 12:27:40 aut-ubuntu-test-v6 kernel: [ 4.298285] NVRM: GPU 0000:06:00.0: RmInitAdapter failed! (0x24:0x72:1100)
Apr 16 12:27:40 aut-ubuntu-test-v6 kernel: [ 4.301930] NVRM: GPU 0000:06:00.0: rm_init_adapter failed, device minor number 0
Apr 16 12:27:40 aut-ubuntu-test-v6 kernel: [ 4.316147] [drm:nv_drm_load [nvidia_drm]] ERROR [nvidia-drm] [GPU ID 0x00000600] Failed to allocate NvKmsKapiDevice
Apr 16 12:27:40 aut-ubuntu-test-v6 kernel: [ 4.319946] [drm:nv_drm_register_drm_device [nvidia_drm]] ERROR [nvidia-drm] [GPU ID 0x00000600] Failed to register device

$ nvidia-smi
No devices were found
$ lspci -nnv -d 10de:
06:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:26b5] (rev a1)
Subsystem: NVIDIA Corporation Device [10de:169d]
Physical Slot: 0-5
Flags: bus master, fast devsel, latency 0, IRQ 22
Memory at 80000000 (32-bit, non-prefetchable) [size=16M]
Memory at (64-bit, prefetchable)
Memory at (64-bit, prefetchable)
Capabilities:
Kernel driver in use: nvidia
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

The GPU (L40) is attached via PCI passthru to a KVM instance managed by OpenStack (2024.1). The instance is using uefi boot, machine type is set to Q35. The underlying host is a HPE DL385 Gen11 (AMD).

While browsing the forum we found the following hints already:

  • use the cuda-drivers instead (rather old tip)
  • add pci=realloc=off as kernel parameter
  • disable secure boot

But that didn’t help.

FTR - the official Ubuntu Cloud image works perfectly on our hypervisor nodes.

Any ideas that might be the problem?

nvidia-bug-report.log.gz (4.5 MB)

While digging deeper into this forum I found a solution for our problem.

The kernel option:

pci=nocrs

needs also be set. And disabling secure boot is not necessary.

see also:

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.