525.85.12 driver fails to access mobile 3080 TI

fluorescent.flux · February 26, 2023, 4:20pm

Recently Debian pushed 525.85.12 driver into Testing release, and after that happened, I lost access to my nvidia GPU (embedded Intel one still works).

There isn’t much info available on it in journalctl. Key lines seem to be:

$ sudo journalctl -b0 -p debug -u nvidia-persistenced
Feb 26 17:48:31 aw systemd[1]: Starting nvidia-persistenced.service - NVIDIA Persistence Daemon...
Feb 26 17:48:31 aw nvidia-persistenced[681]: Started (681)
Feb 26 17:48:36 aw nvidia-persistenced[681]: device 0000:01:00.0 - failed to open.
Feb 26 17:48:38 aw systemd[1]: Started nvidia-persistenced.service - NVIDIA Persistence Daemon.

After load, it attempts to load firmware on every attempt to access GPU. For example, whenever I try to query some NVIDIA-related info (i.e. via glxinfo):

$ __GLX_VENDOR_LIBRARY_NAME=nvidia __NV_PRIME_RENDER_OFFLOAD=1 __VK_LAYER_NV_optimus=NVIDIA_only glxinfo
name of display: :0
X Error of failed request:  BadValue (integer parameter out of range for operation)
  Major opcode of failed request:  152 (GLX)
  Minor opcode of failed request:  24 (X_GLXCreateNewContext)
  Value in failed request:  0x0
  Serial number of failed request:  50
  Current serial number in output stream:  51

Following entries are logged:

Feb 26 22:15:10 aw kernel: nvidia 0000:01:00.0: firmware: direct-loading firmware nvidia/525.85.12/gsp_tu10x.bin
Feb 26 22:15:12 aw kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x0:1835)
Feb 26 22:15:12 aw kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
Feb 26 22:15:12 aw kernel: nvidia 0000:01:00.0: firmware: direct-loading firmware nvidia/525.85.12/gsp_tu10x.bin
Feb 26 22:15:14 aw kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x0:1835)
Feb 26 22:15:14 aw kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
Feb 26 22:15:14 aw kernel: nvidia 0000:01:00.0: firmware: direct-loading firmware nvidia/525.85.12/gsp_tu10x.bin
Feb 26 22:15:17 aw kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x0:1835)
Feb 26 22:15:17 aw kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
Feb 26 22:15:17 aw kernel: nvidia 0000:01:00.0: firmware: direct-loading firmware nvidia/525.85.12/gsp_tu10x.bin
Feb 26 22:15:19 aw kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x0:1835)
Feb 26 22:15:19 aw kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0

It takes about 10 seconds for glxinfo to conclude that info cannot be fetched. Likewise, it takes about 10 seconds to load gdm, and log into xorg session from gdm.

Since there is not much info which could point at where the issue is, can anyone point direction where one (who is not a graphics-/system dev) could look at?

I attempted to find something relevant in google, but everything I found so far was not related to open source kernel module, and always had some extra info before failure to open the device.

More info on my hardware setup is available in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1032003

fluorescent.flux · February 26, 2023, 10:18pm

nvidia-bug-report.log.gz (142.0 KB)

Added bug report log, since I couldn’t while post was on premoderation

generix · February 27, 2023, 8:51am

That’s the wrong driver, the “open kernel modules”.

fluorescent.flux · February 27, 2023, 9:04am

I am unsure what it means. Is 525.85.12’s userspace part usable only with non-open kernel module? Or is open kernel module supposed to co-exist with non-open one?

Debian’s maintainers specified nvidia-driver’s package dependency this way:

nvidia-kernel-dkms (= 525.85.12-1) | nvidia-kernel-525.85.12 | nvidia-open-kernel-525.85.12 | nvidia-open-kernel-525.85.12, nvidia-support

Which implies that either open-source or non-opensource will do. Due to reasons unknown to me, upgrade path switched from the non-open module (which was the only available option in the previous version) to the open one.

edit: switching to non-open kernel module fixed the issue. However, it’d be nice to understand what’s the issue with the open one.

generix · February 27, 2023, 9:54am

The open kernel modules per default work on compute hardware (former Teslas). They can also be enabled on all Turing and up gpus by setting a module option but those are not feature complete so shouldn’t be used especially on mobile gpus
https://github.com/NVIDIA/open-gpu-kernel-modules

system · March 13, 2023, 9:55am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Debian 12, 525.147.05 drivers, no files under /dev Linux	22	3584	March 18, 2024
565 beta driver apparently incompatible with new 6.12 kernel in Debian Sid Linux	9	1469	January 25, 2025
Linux / Debian 12 / MSI 5070 Ti Linux	3	116	August 6, 2025
Linux Mint 20 Driver installation fails - no module loaded Linux linux-driver	13	2363	February 10, 2023
575.57.08 completely broken for RTX 5080-Max-Q Mobile! Linux boot , kernel , driver , linux-driver	2	363	June 11, 2025
RTX 2050 driver not loading Debian 12 Linux	3	437	August 22, 2024
Fedora 12 + SDK 3.1 + driver 256.35 = malfunction Fedora 12 + SDK 3.1 + driver 256.35 = malfunction CUDA Programming and Performance	10	2064	August 19, 2010
GNU/Linux Debian 11 nvidia-drm driver errors (version 460.73.01) Linux boot , kernel	8	2770	August 26, 2022
Driver crashes after upgrade from 510.68.02 to 515.48.07 on GTX 1080ti Linux boot , kernel , nvbugs	6	1130	January 28, 2023
Failed to install driver for NVIDIA A2 on Debian 12 Linux	1	946	July 1, 2024

525.85.12 driver fails to access mobile 3080 TI

Related topics