RTX 3090 on Arch Linux - nvidia_drm Failed to allocate NvKmsKapiDevice; Failed to register device

Hey NVIDIA Community,

Recently got myself an RTX 3090 that’s connected to a Dell XPS 9320 Plus via a Thunderbolt 3 eGPU enclosure. That card is supposed to drive the Dell 8K 32" monitor, and I have two DP 1.4 cables running from the monitor to the graphics card.

When using Arch Linux packages nvidia, nvidia-beta, nvidia-dkms, and nvidia-beta-dkms I receive the following kernel messages and then errors as soon as the applicable modules are loaded.

[    0.028820] Kernel command line: initrd=\intel-ucode.img initrd=\initramfs-linux.img cryptdevice=PARTUUID=f-o-o-b-a-r:luksdev root=/dev/mapper/luksdev zswap.enabled=0 rw rootfstype=f2fs ibt=off nvidia_drm.modeset=1
[   20.406582] nvidia: loading out-of-tree module taints kernel.
[   20.406589] nvidia: module license 'NVIDIA' taints kernel.
[   20.421694] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[   20.563366] nvidia-nvlink: Nvlink Core is being initialized, major device number 235
[   20.565946] nvidia-nvlink: Unregistered Nvlink Core, major device number 235
[  267.361161] nvidia-nvlink: Nvlink Core is being initialized, major device number 507
[  267.361961] nvidia 0000:43:00.0: enabling device (0000 -> 0003)
[  267.362097] nvidia 0000:43:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
[  267.430964] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  525.60.11  Wed Nov 23 22:49:17 UTC 2022
[  267.432205] [drm] [nvidia-drm] [GPU ID 0x00004300] Loading driver
[  268.755288] [drm:nv_drm_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00004300] Failed to allocate NvKmsKapiDevice
[  268.755533] [drm:nv_drm_probe_devices [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00004300] Failed to register device

The above in particular was with nvidia-beta, but the same messages appear with each of the other packages mentioned, which are attributable to versions 520.56.06 and 525.60.11.

While it’s perhaps not surprising, I’ll go out of my way to say that whenever I see these errors, my external monitor does no work, and it doesn’t appear that the graphics card is used by the system at all, even though it’s detected by lspci.

00:00.0 Host bridge: Intel Corporation Device 4621 (rev 02)
00:02.0 VGA compatible controller: Intel Corporation Alder Lake-P Integrated Graphics Controller (rev 0c)
00:04.0 Signal processing controller: Intel Corporation Alder Lake Innovation Platform Framework Processor Participant (rev 02)
00:05.0 Multimedia controller: Intel Corporation Device 465d (rev 02)
00:06.0 PCI bridge: Intel Corporation 12th Gen Core Processor PCI Express x4 Controller #0 (rev 02)
00:07.0 PCI bridge: Intel Corporation Alder Lake-P Thunderbolt 4 PCI Express Root Port #0 (rev 02)
00:07.2 PCI bridge: Intel Corporation Alder Lake-P Thunderbolt 4 PCI Express Root Port #2 (rev 02)
00:08.0 System peripheral: Intel Corporation 12th Gen Core Processor Gaussian & Neural Accelerator (rev 02)
00:0d.0 USB controller: Intel Corporation Alder Lake-P Thunderbolt 4 USB Controller (rev 02)
00:0d.2 USB controller: Intel Corporation Alder Lake-P Thunderbolt 4 NHI #0 (rev 02)
00:0d.3 USB controller: Intel Corporation Alder Lake-P Thunderbolt 4 NHI #1 (rev 02)
00:12.0 Serial controller: Intel Corporation Device 51fc (rev 01)
00:14.0 USB controller: Intel Corporation Alder Lake PCH USB 3.2 xHCI Host Controller (rev 01)
00:14.2 RAM memory: Intel Corporation Alder Lake PCH Shared SRAM (rev 01)
00:14.3 Network controller: Intel Corporation Alder Lake-P PCH CNVi WiFi (rev 01)
00:15.0 Serial bus controller: Intel Corporation Alder Lake PCH Serial IO I2C Controller #0 (rev 01)
00:15.1 Serial bus controller: Intel Corporation Alder Lake PCH Serial IO I2C Controller #1 (rev 01)
00:16.0 Communication controller: Intel Corporation Alder Lake PCH HECI Controller (rev 01)
00:1e.0 Communication controller: Intel Corporation Alder Lake PCH UART #0 (rev 01)
00:1e.3 Serial bus controller: Intel Corporation Device 51ab (rev 01)
00:1f.0 ISA bridge: Intel Corporation Alder Lake PCH eSPI Controller (rev 01)
00:1f.3 Multimedia audio controller: Intel Corporation Alder Lake PCH-P High Definition Audio Controller (rev 01)
00:1f.4 SMBus: Intel Corporation Alder Lake PCH-P SMBus Host Controller (rev 01)
00:1f.5 Serial bus controller: Intel Corporation Alder Lake-P PCH SPI Controller (rev 01)
01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO
41:00.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
42:01.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
42:04.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
43:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3090] (rev a1)
43:00.1 Audio device: NVIDIA Corporation GA102 High Definition Audio Controller (rev a1)
44:00.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
45:00.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
45:01.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
45:02.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
45:03.0 PCI bridge: Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 4C 2016] (rev 02)
46:00.0 USB controller: ASMedia Technology Inc. ASM1142 USB 3.1 Host Controller
47:00.0 USB controller: ASMedia Technology Inc. ASM1142 USB 3.1 Host Controller

I typically run GNOME 43 on Wayland, but since my reading tells me that all things NVIDIA seem to work better with X11, I installed and configured that as well. The logs from the helper script are attached.

rtx3090_arch_20221129_nvidia-beta_x11.log.gz (273.3 KB)

BIOS is fully up to date. Everything works flawlessly on Windows, which hopefully rules out hardware issues. Interestingly, nvidia-open on Arch does work, just with all the performance issues that are reported to come with it.

It’s quite possible that this is a ‘me’ problem and not actually a problem with the drivers, etc. However, I’m at a loss as to how to proceed, since it doesn’t seem like a bug with the Arch Linux package itself (though I will confess I didn’t try to install another disto just to check). I’d like to use the regular, fully-closed-source nvidia drivers pre-packaged with Arch if possible.

Any insight internet?

RmInitAdapter failed! (0x26:0x56:1473)
Known issue, Nvidia tells: won’t fix, bios issue. Workarounds, as you already know: Driver <470, 470.57-470.82, open modules.

Thanks for helping to restore my sanity!