Debian 12, 525.147.05 drivers, no files under /dev

ahlawatkaran12 · February 20, 2024, 5:14am

I have been facing problems with getting my GPU running for quite some time now. Previously, everything was working well, but I suspect an update broke something.

I am on the 525.147.05 drivers installed from the debian repositories using the nvidia-driver package. First, when I ran the nvidia-smi command, it used to say No devices find, and I used to get the failed to allocate NvKmsKapi in the dmesg logs. Along with the RmInitAdapter thing.
Recently however, I get the following

NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running

Looking the the status of nvidia-persistenced, it says

Failed to query NVIDIA devices. Please ensure that the NVIDIA device files (/dev/nvidia*) exist, and that user 115 has read and write permis…

Checking /dev, it indeed doesn’t have those files.
I have tried older drivers, tried 535 from the backports repository, tried doing a clean reinstall of 525 drivers itself, all to no avail.

I am coming to this forum as sort of my last hope, since no one who has been kind enough to help me so far has been able to pinpoint what’s going on.

Here are the logs from nvidia-bug-report.sh
nvidia-bug-report.log.gz (240.7 KB)

roliverio · February 20, 2024, 1:53pm

It seems that the NVIDIA module is not loaded. which makes me suspect that it didn’t build. Have you checked for the presence of the module(s) under: /lib/modules/$YOUR_KERNEL_VERSION/updates/dkms ?

If they’re not there, the most common cause is that your kernel headers are not installed.

ahlawatkaran12 · February 20, 2024, 4:22pm

So, I do have the modules under the mentioned location. The following are present

nvidia-current-drm.ko
nvidia-current-modeset.ko
nvidia-current-peermem.ko
nvidia-current-uvm.ko
nvidia-current.ko

Also, the linux headers are installed, checked by ls -l /usr/src/linux-headers-$(uname -r)

generix · February 21, 2024, 8:33am

Please disable secure boot in bios.

ahlawatkaran12 · February 21, 2024, 4:10pm

I have disabled secure boot, which has changed the error I am now getting.
When running nvidia-smi, it says No devices found.
Running sudo dmesg | grep -i nvidia shows the following

[    0.030464] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.5.0-0.deb12.4-amd64 root=UUID=bd7a24c0-9057-49f1-ab49-48506df0c89d ro rd.driver.blacklist=nouveau modprobe.blacklist=nouveau nvidia-drm.modeset=1 quiet splash
[    5.057323] nvidia: loading out-of-tree module taints kernel.
[    5.057330] nvidia: module license 'NVIDIA' taints kernel.
[    5.057333] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[    5.057334] nvidia: module license taints kernel.
[    5.238540] nvidia-nvlink: Nvlink Core is being initialized, major device number 237
[    5.239182] nvidia 0000:01:00.0: enabling device (0006 -> 0007)
[    5.239268] nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
[    5.312024] audit: type=1400 audit(1708531242.821:3): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=676 comm="apparmor_parser"
[    5.312027] audit: type=1400 audit(1708531242.821:4): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=676 comm="apparmor_parser"
[    6.004662] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  525.147.05  Wed Oct 25 20:21:31 UTC 2023
[    6.641537] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[    6.673948] [drm:nv_drm_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NvKmsKapiDevice
[    6.674023] [drm:nv_drm_probe_devices [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to register device

Here is the bug report I generated after disabling secure boot.
nvidia-bug-report.log.gz (271.6 KB)

generix · February 21, 2024, 5:04pm

[    6.673695] ACPI BIOS Error (bug): AE_AML_BUFFER_LIMIT, Field [TMPB] at bit offset/length 1572864/32768 exceeds size of target Buffer (262144 bits) (20230331/dsopcode-198)
[    6.673699] ACPI Error: Aborting method \_SB.PCI0.PEG0.PEGP._ROM due to previous error (AE_AML_BUFFER_LIMIT) (20230331/psparse-529)
[    6.673716] NVRM: GPU 0000:01:00.0: Failed to copy vbios to system memory.
[    6.673809] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x30:0xffff:974)
[    6.673853] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0

With that message, the gpu is most likely broken. You might want to install Windows to double-check.

ahlawatkaran12 · February 21, 2024, 5:10pm

Alright, I’ll check and post the findings here this weekend

generix · February 21, 2024, 5:11pm

I guess you should re-enable secure boot to disable the driver meanwhile.

ahlawatkaran12 · March 2, 2024, 10:59am

Sorry, I totally forgot to update this thread!
So over the last week I installed windows, and then found out that my GPU might really be dead. Opening up Device Manager, it shows that the Nvidia GPU did not start because of some errors it reported. Neither did reinstalling the latest Nvidia drivers from their Nvidia’s website help.

The exact code reported was code 43, with error code shown as 0000002B. Not sure if those numbers mean much to anyone here, but yeah, that’s what I found.

generix · March 2, 2024, 2:22pm

code 43 means the gpu is dead.

ahlawatkaran12 · March 2, 2024, 2:25pm

Well then, thank you for your time. I’ll see what steps I can take from here :(

generix · March 2, 2024, 2:52pm

Since it’s an issue with the vbios, there’s a very slim chance this can be fixed with a reflash. To check you would need to use nouveau to debug:
https://forums.developer.nvidia.com/t/nvidia-geforce-gtx-1650-no-external-monitor-not-detected-in-xrandr-opensuse-leap-15-3/215296/19

ahlawatkaran12 · March 2, 2024, 4:07pm

I tried my best to follow the instructions, so there’s what I got in the dmesg
dmesg.txt (91.3 KB)

generix · March 4, 2024, 12:28pm

[    2.161249] nouveau 0000:01:00.0: bios: trying PRAMIN...
[    2.161253] nouveau 0000:01:00.0: bios: ... not enabled
[    2.161254] nouveau 0000:01:00.0: bios: trying PROM...
[    2.162340] nouveau 0000:01:00.0: bios: 00000000: ROM signature (0000) unknown
[    2.162341] nouveau 0000:01:00.0: bios: image 0 invalid
[    2.162343] nouveau 0000:01:00.0: bios: scored 0
[    2.162344] nouveau 0000:01:00.0: bios: trying ACPI...
[    2.162616] nouveau 0000:01:00.0: bios: 00000000: ROM signature (0000) unknown
[    2.162618] nouveau 0000:01:00.0: bios: image 0 invalid
[    2.162618] nouveau 0000:01:00.0: bios: scored 0
[    2.162619] nouveau 0000:01:00.0: bios: trying ACPI...
[    2.162865] nouveau 0000:01:00.0: bios: 00000000: ROM signature (0000) unknown
[    2.162866] nouveau 0000:01:00.0: bios: image 0 invalid
[    2.162867] nouveau 0000:01:00.0: bios: scored 0
[    2.162867] nouveau 0000:01:00.0: bios: trying PCIROM...
[    2.162878] nouveau 0000:01:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0x0000
[    2.162883] nouveau 0000:01:00.0: bios: trying PLATFORM...
[    2.162884] nouveau 0000:01:00.0: bios: unable to locate usable image

nouveau can read nothing but zeros, I guess the rom is dead.

ahlawatkaran12 · March 4, 2024, 12:54pm

Is that something a flash would be able to fix as you suggested before? I’m not at all familiar with GPU architecture unfortunately

generix · March 4, 2024, 1:25pm

If the flash rom is intact but only the vbios image stored on it is corrupt, a reflash might be able to fix it. In your case, the flash rom by it self seems to be broken so there’s nothing to flash.
You could try to reflash it anyway using nvflash or try loading it from file
https://blog.umito.nl/2014/04/13/getting-a-romless-mxm-card-to-work-on-ubuntu-in-a-laptop-with-no-bios-support-for-it.html
This should be the correct vbios file for your laptop:
https://www.techpowerup.com/vgabios/222797/222797

ahlawatkaran12 · March 4, 2024, 2:11pm

Tried running nvflash, it’s complaining about no EEPROM being found or supported. Do I need the nouveua module installed and loaded during this process?

generix · March 4, 2024, 2:23pm

So reflashing won’t work, the flash rom is gone.
You can only try to load it from disk on every boot.

ahlawatkaran12 · March 4, 2024, 2:24pm

That doesn’t sound like a good solution, what do you think?

generix · March 4, 2024, 2:30pm

Of course not, it’s a fiddly make shift to keep the notebook with nvidia gpu alive. Two alternatives:

keep using the notebook without the nvidia gpu.
buy a new one.

Topic		Replies	Views
Newly installed drivers are not found when nvidia-smi is called. Linux	17	33790	April 3, 2025
NVIDIA drivers not being detected in Fedora/KDE Linux	10	14984	October 12, 2021
Ubuntu - nvidia driver installed but not running Linux	1	712	December 5, 2022
nvidia-smi : no devices were found. Linux	10	4159	April 2, 2020
nvidia-smi "No devices were found" error CUDA Setup and Installation	23	62512	February 14, 2021
Nvidia driver is not working on Ubuntu 22.04 Linux linux , linux-driver	25	38972	February 20, 2025
Nvidia-smi : No Devices were found - Debian 12 Drivers - Linux, Windows, MacOS	1	907	June 3, 2024
NVIDIA driver is not loaded. Ubuntu 18.10 Linux	310	129848	February 14, 2024
Failed to install driver for NVIDIA A2 on Debian 12 Linux	1	841	July 1, 2024
Installing nvidia driver in chroot Linux	15	5833	April 3, 2023

Debian 12, 525.147.05 drivers, no files under /dev

Related topics