Failed to allocate NvKmsKapiDevice and Failed to register device (Rocky 9.5. and Kernel 6.12.9)

vitaliy-kos · February 3, 2025, 12:52pm

Hello, we are facing such a problem.
We had the following server configuration: Rocky 8 and kernel 5.14, somewhere around 6.8
One of the servers has 6 Tesla T4
nvidia-smi graphics cards - it worked without problems, it showed everything.

There is an urgent need to upgrade the system to Rocky 9 and kernel 6.8+ (in our case, yum updated it to kernel-ml 6.12.9)
I don’t remember what the smi and cuda versions were :(

After that, we started having problems.,

nvidia-smi does not show all graphics cards as it used to
nvidia-smi may cause server restart
After installing nvidia-driver and restarting the server, the server could crash and restart again.

The key mistakes are:

[ 6.072708] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 570.86.15 Thu Jan 23 23:23:10 UTC 2025
[ 6.126896] nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint.
[ 6.169033] nvidia-uvm: Loaded the UVM driver, major device number 510.
[ 6.209091] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 570.86.15 Thu Jan 23 22:30:06 UTC 2025
[ 6.215808] [drm] [nvidia-drm] [GPU ID 0x00001b00] Loading driver
[ 7.597873] [drm] Initialized nvidia-drm 0.0.0 for 0000:1b:00.0 on minor 1
[ 7.597890] nvidia 0000:1b:00.0: [drm] No compatible format found
[ 7.597893] nvidia 0000:1b:00.0: [drm] Cannot find any crtc or sizes
[ 7.598099] [drm] [nvidia-drm] [GPU ID 0x00001c00] Loading driver
[ 8.638439] [drm] Initialized nvidia-drm 0.0.0 for 0000:1c:00.0 on minor 2
[ 8.638459] nvidia 0000:1c:00.0: [drm] No compatible format found
[ 8.638462] nvidia 0000:1c:00.0: [drm] Cannot find any crtc or sizes
[ 8.638658] [drm] [nvidia-drm] [GPU ID 0x00001e00] Loading driver
[ 9.684298] [drm] Initialized nvidia-drm 0.0.0 for 0000:1e:00.0 on minor 3
[ 9.684314] nvidia 0000:1e:00.0: [drm] No compatible format found
[ 9.684317] nvidia 0000:1e:00.0: [drm] Cannot find any crtc or sizes
[ 9.684548] [drm] [nvidia-drm] [GPU ID 0x00003f00] Loading driver
[ 10.591535] resource: resource sanity check: requesting [mem 0x00000000b7700000-0x00000000b86fffff], which spans more than PCI Bus 0000:3b [mem 0xb5000000-0xb84fffff]
[ 10.591541] caller _nv046819rm+0x3a/0xb0 [nvidia] mapping multiple BARs
[ 10.600495] [drm:nv_drm_load [nvidia_drm]] ERROR [nvidia-drm] [GPU ID 0x00003f00] Failed to allocate NvKmsKapiDevice
[ 10.600679] [drm:nv_drm_register_drm_device [nvidia_drm]] ERROR [nvidia-drm] [GPU ID 0x00003f00] Failed to register device
[ 10.600836] [drm] [nvidia-drm] [GPU ID 0x00004000] Loading driver
[ 11.819075] resource: resource sanity check: requesting [mem 0x00000000b5700000-0x00000000b66fffff], which spans more than PCI Bus 0000:40 [mem 0xb5000000-0xb63fffff]
[ 11.819083] caller _nv046819rm+0x3a/0xb0 [nvidia] mapping multiple BARs
[ 11.827706] [drm:nv_drm_load [nvidia_drm]] ERROR [nvidia-drm] [GPU ID 0x00004000] Failed to allocate NvKmsKapiDevice
[ 11.827827] [drm:nv_drm_register_drm_device [nvidia_drm]] ERROR [nvidia-drm] [GPU ID 0x00004000] Failed to register device
[ 11.827974] [drm] [nvidia-drm] [GPU ID 0x00005e00] Loading driver
[ 13.137452] [drm] Initialized nvidia-drm 0.0.0 for 0000:5e:00.0 on minor 4
[ 13.137470] nvidia 0000:5e:00.0: [drm] No compatible format found
[ 13.137473] nvidia 0000:5e:00.0: [drm] Cannot find any crtc or sizes

We tried changing /etc/default/grub
We tried installing different driver versions (suitable for Tesla T4 | Linux 64-bit RHEL 9, 570.86.15, 550.144.03, 535.230.02, 550.127.08 )
We tried installing via the run file.
There is no result :(

How can we solve this problem?
And is it possible for nvidia-driver to work with Rocky 9.5 and kernel 6.12.9?

vitaliy-kos · February 12, 2025, 12:15pm

nvidia-smi -L
GPU 0: Tesla T4 (UUID: GPU-7e72779e-00d4-6c68-ba84-7726909da764)
GPU 1: Tesla T4 (UUID: GPU-79405ede-bba6-5c34-d48d-1ab4d1d48a8e)
GPU 2: Tesla T4 (UUID: GPU-e6c0ca41-b425-75d8-68c3-6751367eb5b7)
GPU 3: Tesla T4 (UUID: GPU-a1165eb6-a4e4-3a89-0e91-d7e8e20d717f)
[root@scanh2-4 ~]# lspci | grep -i nvidia
1b:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
1c:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
1e:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
3f:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
40:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
5e:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)

dmesg | grep -i 40:00.0
[ 0.691219] pci 0000:40:00.0: [10de:1eb8] type 00 class 0x030200 PCIe Endpoint
[ 0.691238] pci 0000:40:00.0: BAR 0 [mem 0xb5000000-0xb5ffffff]
[ 0.691255] pci 0000:40:00.0: BAR 1 [mem 0x3afe80000000-0x3afe8fffffff 64bit pref]
[ 0.691271] pci 0000:40:00.0: BAR 3 [mem 0x3afeb0000000-0x3afeb1ffffff 64bit pref]
[ 0.691294] pci 0000:40:00.0: enabling Extended Tags
[ 0.691321] pci 0000:40:00.0: Enabling HDA controller
[ 0.691379] pci 0000:40:00.0: PME# supported from D0 D3hot D3cold
[ 0.691418] pci 0000:40:00.0: VF BAR 0 [mem 0xb6000000-0xb603ffff]
[ 0.691420] pci 0000:40:00.0: VF BAR 0 [mem 0xb6000000-0xb63fffff]: contains BAR 0 for 16 VFs
[ 0.691429] pci 0000:40:00.0: VF BAR 1 [mem 0x3afd80000000-0x3afd8fffffff 64bit pref]
[ 0.691430] pci 0000:40:00.0: VF BAR 1 [mem 0x3afd80000000-0x3afe7fffffff 64bit pref]: contains BAR 1 for 16 VFs
[ 0.691439] pci 0000:40:00.0: VF BAR 3 [mem 0x3afe90000000-0x3afe91ffffff 64bit pref]
[ 0.691440] pci 0000:40:00.0: VF BAR 3 [mem 0x3afe90000000-0x3afeafffffff 64bit pref]: contains BAR 3 for 16 VFs
[ 0.717385] pci 0000:40:00.0: BAR 1 [mem 0x3afe80000000-0x3afe8fffffff 64bit pref]: can’t claim; no compatible bridge window
[ 0.717387] pci 0000:40:00.0: BAR 3 [mem 0x3afeb0000000-0x3afeb1ffffff 64bit pref]: can’t claim; no compatible bridge window
[ 0.717388] pci 0000:40:00.0: VF BAR 1 [mem 0x3afd80000000-0x3afe7fffffff 64bit pref]: can’t claim; no compatible bridge window
[ 0.717390] pci 0000:40:00.0: VF BAR 3 [mem 0x3afe90000000-0x3afeafffffff 64bit pref]: can’t claim; no compatible bridge window
[ 0.752175] pci 0000:40:00.0: BAR 1 [mem size 0x10000000 64bit pref]: can’t assign; no space
[ 0.752176] pci 0000:40:00.0: BAR 1 [mem 0x3afe80000000-0x3afe8fffffff 64bit pref]: failed to assign
[ 0.752177] pci 0000:40:00.0: VF BAR 1 [mem size 0x100000000 64bit pref]: can’t assign; no space
[ 0.752179] pci 0000:40:00.0: VF BAR 1 [mem 0x3afd80000000-0x3afe7fffffff 64bit pref]: failed to assign
[ 0.752180] pci 0000:40:00.0: BAR 3 [mem size 0x02000000 64bit pref]: can’t assign; no space
[ 0.752182] pci 0000:40:00.0: BAR 3 [mem 0x3afeb0000000-0x3afeb1ffffff 64bit pref]: failed to assign
[ 0.752183] pci 0000:40:00.0: VF BAR 3 [mem size 0x20000000 64bit pref]: can’t assign; no space
[ 0.752185] pci 0000:40:00.0: VF BAR 3 [mem 0x3afe90000000-0x3afeafffffff 64bit pref]: failed to assign
[ 0.752186] pci 0000:40:00.0: BAR 1 [mem size 0x10000000 64bit pref]: can’t assign; no space
[ 0.752188] pci 0000:40:00.0: BAR 1 [mem 0x3afe80000000-0x3afe8fffffff 64bit pref]: failed to assign
[ 0.752189] pci 0000:40:00.0: BAR 3 [mem size 0x02000000 64bit pref]: can’t assign; no space
[ 0.752190] pci 0000:40:00.0: BAR 3 [mem 0x3afeb0000000-0x3afeb1ffffff 64bit pref]: failed to assign
[ 0.752192] pci 0000:40:00.0: VF BAR 3 [mem size 0x20000000 64bit pref]: can’t assign; no space
[ 0.752193] pci 0000:40:00.0: VF BAR 3 [mem 0x3afe90000000-0x3afeafffffff 64bit pref]: failed to assign
[ 0.752195] pci 0000:40:00.0: VF BAR 1 [mem size 0x100000000 64bit pref]: can’t assign; no space
[ 0.752196] pci 0000:40:00.0: VF BAR 1 [mem 0x3afd80000000-0x3afe7fffffff 64bit pref]: failed to assign
[ 2.534998] nvidia 0000:40:00.0: enabling device (0100 → 0102)
NVRM: BAR1 is 0M @ 0x0 (PCI:0000:40:00.0)
NVRM: BAR2 is 0M @ 0x0 (PCI:0000:40:00.0)
NVRM: BAR3 is 0M @ 0x0 (PCI:0000:40:00.0)
NVRM: BAR4 is 0M @ 0x0 (PCI:0000:40:00.0)
NVRM: BAR5 is 0M @ 0x0 (PCI:0000:40:00.0)
[ 4.780729] [drm] Initialized nvidia-drm 0.0.0 for 0000:40:00.0 on minor 5
[ 135.112806] NVRM: GPU 0000:40:00.0: RmInitAdapter failed! (0x24:0x72:1568)
[ 135.112962] NVRM: GPU 0000:40:00.0: rm_init_adapter failed, device minor number 4

vitaliy-kos · March 4, 2025, 4:39pm

There are no more options with the core,

we tried different versions of 5.14 as well,
as with the driver version, we were waiting for new ones - there are no results from them!
We tried swapping the video cards, but there was no result, at most we reduced the number of video cards to 3 in the nvidia table-smi
in the bios Above 4G decoding was enabled initially, we did not find any other suitable settings.

Can anyone help? It’s been so long, and we don’t have any results…

even when 3\6 video cards in nvidia-smi stopped working,
the errors were still the same.:

[root@scanh2-4 ~]# dmesg | grep -i 41:00.0
[ 0.569634] pci 0000:41:00.0: [10de:1eb8] type 00 class 0x030200 PCIe Endpoint
[ 0.569654] pci 0000:41:00.0: BAR 0 [mem 0xb5000000-0xb5ffffff]
[ 0.569670] pci 0000:41:00.0: BAR 1 [mem 0x3afe80000000-0x3afe8fffffff 64bit pref]
[ 0.569687] pci 0000:41:00.0: BAR 3 [mem 0x3afeb0000000-0x3afeb1ffffff 64bit pref]
[ 0.569711] pci 0000:41:00.0: enabling Extended Tags
[ 0.569738] pci 0000:41:00.0: Enabling HDA controller
[ 0.569800] pci 0000:41:00.0: PME# supported from D0 D3hot D3cold
[ 0.569842] pci 0000:41:00.0: VF BAR 0 [mem 0xb6000000-0xb603ffff]
[ 0.569843] pci 0000:41:00.0: VF BAR 0 [mem 0xb6000000-0xb63fffff]: contains BAR 0 for 16 VFs
[ 0.569852] pci 0000:41:00.0: VF BAR 1 [mem 0x3afd80000000-0x3afd8fffffff 64bit pref]
[ 0.569854] pci 0000:41:00.0: VF BAR 1 [mem 0x3afd80000000-0x3afe7fffffff 64bit pref]: contains BAR 1 for 16 VFs
[ 0.569863] pci 0000:41:00.0: VF BAR 3 [mem 0x3afe90000000-0x3afe91ffffff 64bit pref]
[ 0.569865] pci 0000:41:00.0: VF BAR 3 [mem 0x3afe90000000-0x3afeafffffff 64bit pref]: contains BAR 3 for 16 VFs
[ 0.594141] pci 0000:41:00.0: BAR 1 [mem 0x3afe80000000-0x3afe8fffffff 64bit pref]: can’t claim; no compatible bridge window
[ 0.594143] pci 0000:41:00.0: BAR 3 [mem 0x3afeb0000000-0x3afeb1ffffff 64bit pref]: can’t claim; no compatible bridge window
[ 0.594144] pci 0000:41:00.0: VF BAR 1 [mem 0x3afd80000000-0x3afe7fffffff 64bit pref]: can’t claim; no compatible bridge window
[ 0.594145] pci 0000:41:00.0: VF BAR 3 [mem 0x3afe90000000-0x3afeafffffff 64bit pref]: can’t claim; no compatible bridge window
[ 0.628696] pci 0000:41:00.0: BAR 1 [mem size 0x10000000 64bit pref]: can’t assign; no space
[ 0.628698] pci 0000:41:00.0: BAR 1 [mem 0x3afe80000000-0x3afe8fffffff 64bit pref]: failed to assign
[ 0.628699] pci 0000:41:00.0: VF BAR 1 [mem size 0x100000000 64bit pref]: can’t assign; no space
[ 0.628700] pci 0000:41:00.0: VF BAR 1 [mem 0x3afd80000000-0x3afe7fffffff 64bit pref]: failed to assign
[ 0.628702] pci 0000:41:00.0: BAR 3 [mem size 0x02000000 64bit pref]: can’t assign; no space
[ 0.628703] pci 0000:41:00.0: BAR 3 [mem 0x3afeb0000000-0x3afeb1ffffff 64bit pref]: failed to assign
[ 0.628705] pci 0000:41:00.0: VF BAR 3 [mem size 0x20000000 64bit pref]: can’t assign; no space
[ 0.628706] pci 0000:41:00.0: VF BAR 3 [mem 0x3afe90000000-0x3afeafffffff 64bit pref]: failed to assign
[ 0.628708] pci 0000:41:00.0: BAR 1 [mem size 0x10000000 64bit pref]: can’t assign; no space
[ 0.628709] pci 0000:41:00.0: BAR 1 [mem 0x3afe80000000-0x3afe8fffffff 64bit pref]: failed to assign
[ 0.628710] pci 0000:41:00.0: BAR 3 [mem size 0x02000000 64bit pref]: can’t assign; no space
[ 0.628712] pci 0000:41:00.0: BAR 3 [mem 0x3afeb0000000-0x3afeb1ffffff 64bit pref]: failed to assign
[ 0.628713] pci 0000:41:00.0: VF BAR 3 [mem size 0x20000000 64bit pref]: can’t assign; no space
[ 0.628714] pci 0000:41:00.0: VF BAR 3 [mem 0x3afe90000000-0x3afeafffffff 64bit pref]: failed to assign
[ 0.628716] pci 0000:41:00.0: VF BAR 1 [mem size 0x100000000 64bit pref]: can’t assign; no space
[ 0.628717] pci 0000:41:00.0: VF BAR 1 [mem 0x3afd80000000-0x3afe7fffffff 64bit pref]: failed to assign
[ 4.183433] nvidia 0000:41:00.0: enabling device (0100 → 0102)
NVRM: BAR1 is 0M @ 0x0 (PCI:0000:41:00.0)
NVRM: BAR2 is 0M @ 0x0 (PCI:0000:41:00.0)
NVRM: BAR3 is 0M @ 0x0 (PCI:0000:41:00.0)
NVRM: BAR4 is 0M @ 0x0 (PCI:0000:41:00.0)
NVRM: BAR5 is 0M @ 0x0 (PCI:0000:41:00.0)
[ 4.308053] [drm] Initialized nvidia-drm 0.0.0 for 0000:41:00.0 on minor 5
[ 29.003897] NVRM: GPU 0000:41:00.0: RmInitAdapter failed! (0x24:0x72:1513)
[ 29.004003] NVRM: GPU 0000:41:00.0: rm_init_adapter failed, device minor number 4
[ 1386.646514] NVRM: GPU 0000:41:00.0: RmInitAdapter failed! (0x62:0x40:2521)
[ 1386.647158] NVRM: GPU 0000:41:00.0: rm_init_adapter failed, device minor number 4
[ 2292.508048] NVRM: GPU 0000:41:00.0: RmInitAdapter failed! (0x62:0x40:2521)
[ 2292.508175] NVRM: GPU 0000:41:00.0: rm_init_adapter failed, device minor number 4
[ 2867.208641] NVRM: GPU 0000:41:00.0: RmInitAdapter failed! (0x62:0x40:2521)
[ 2867.208744] NVRM: GPU 0000:41:00.0: rm_init_adapter failed, device minor number 4
[ 3404.059292] NVRM: GPU 0000:41:00.0: RmInitAdapter failed! (0x62:0x40:2521)
[ 3404.059426] NVRM: GPU 0000:41:00.0: rm_init_adapter failed, device minor number 4
[ 3421.906454] NVRM: GPU 0000:41:00.0: RmInitAdapter failed! (0x62:0x40:2521)
[ 3421.907209] NVRM: GPU 0000:41:00.0: rm_init_adapter failed, device minor number 4
[ 5989.456130] NVRM: GPU 0000:41:00.0: RmInitAdapter failed! (0x62:0x40:2521)
[ 5989.456829] NVRM: GPU 0000:41:00.0: rm_init_adapter failed, device minor number 4

vitaliy-kos · March 6, 2025, 2:20pm

even when we built the kernel and installed 6.8.5, it didn’t work for us.

finally, we realized what the problem was:
we changed /etc/default/grub dozens of times, but as it turned out, there was no result,
if we use cat /proc/cmdline: it will show what is being used in grub right now and our new parameters were not there.

The problem is that in /etc/default/grub there is a GRUB_ENABLE_BLSCFG parameter, if it is true, then grub takes data from
/boot/loader/entries/…- yourKernelVersion
After we made it false, the problem on the 6.8.5 kernel was resolved.
(however, I didn’t decide on 6.13.5)

our parameters in grub are:
pci=use_crs pci=realloc=on pci=assign-busses

system · March 20, 2025, 2:21pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Failed to allocate NvKmsKapiDevice, Failed to register device(GeForce RTX 3070, Ubuntu 18.04.6) Drivers - Linux, Windows, MacOS ubuntu , nvbugs	8	3614	June 16, 2025
Fedora 40: Nvidia driver running at random boots with kernel 6.8.9-300.fc40.x86_64, with 6.8.10 or 6.8.11 it doesn't run at all Linux boot , kernel	2	565	July 14, 2024
Failed to allocate NvKmsKapiDevice, RTX 3060 12g mobile Linux	2	1542	December 20, 2023
RTX4090 on Ubuntu 20.04 - Failed to allocate NvKmsKapiDevice and to register device Linux boot	2	2154	May 16, 2023
Failed to allocate NvKmsKapiDevice, Failed to register device Drivers - Linux, Windows, MacOS drm	0	1179	October 20, 2022
Ubuntu 21.10 with GeForce 1650: nvidia-drm fails to allocate NvKmsKapiDevice, and fails to register device Linux	13	14700	February 3, 2023
Fedora 39: With Kernel versions after 6.6.2-201 the nvidia driver doesn't work properly. [Failed to allocate NvKmsKapiDevice] Linux	12	1042	June 6, 2024
Failed to allocate NvKmsKapiDevice Linux	0	19	August 8, 2025
Error when installing nvidia driver - Tesla K40m on Linux RHEL Linux	28	2724	October 12, 2021
CUDA Toolkit on Rocky Linux 9 nvidia-smi Fails Linux cuda	9	4234	October 5, 2022

Failed to allocate NvKmsKapiDevice and Failed to register device (Rocky 9.5. and Kernel 6.12.9)

Related topics