Recently Debian pushed 525.85.12 driver into Testing release, and after that happened, I lost access to my nvidia GPU (embedded Intel one still works).
There isn’t much info available on it in journalctl. Key lines seem to be:
$ sudo journalctl -b0 -p debug -u nvidia-persistenced
Feb 26 17:48:31 aw systemd[1]: Starting nvidia-persistenced.service - NVIDIA Persistence Daemon...
Feb 26 17:48:31 aw nvidia-persistenced[681]: Started (681)
Feb 26 17:48:36 aw nvidia-persistenced[681]: device 0000:01:00.0 - failed to open.
Feb 26 17:48:38 aw systemd[1]: Started nvidia-persistenced.service - NVIDIA Persistence Daemon.
After load, it attempts to load firmware on every attempt to access GPU. For example, whenever I try to query some NVIDIA-related info (i.e. via glxinfo):
$ __GLX_VENDOR_LIBRARY_NAME=nvidia __NV_PRIME_RENDER_OFFLOAD=1 __VK_LAYER_NV_optimus=NVIDIA_only glxinfo
name of display: :0
X Error of failed request: BadValue (integer parameter out of range for operation)
Major opcode of failed request: 152 (GLX)
Minor opcode of failed request: 24 (X_GLXCreateNewContext)
Value in failed request: 0x0
Serial number of failed request: 50
Current serial number in output stream: 51
Following entries are logged:
Feb 26 22:15:10 aw kernel: nvidia 0000:01:00.0: firmware: direct-loading firmware nvidia/525.85.12/gsp_tu10x.bin
Feb 26 22:15:12 aw kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x0:1835)
Feb 26 22:15:12 aw kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
Feb 26 22:15:12 aw kernel: nvidia 0000:01:00.0: firmware: direct-loading firmware nvidia/525.85.12/gsp_tu10x.bin
Feb 26 22:15:14 aw kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x0:1835)
Feb 26 22:15:14 aw kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
Feb 26 22:15:14 aw kernel: nvidia 0000:01:00.0: firmware: direct-loading firmware nvidia/525.85.12/gsp_tu10x.bin
Feb 26 22:15:17 aw kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x0:1835)
Feb 26 22:15:17 aw kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
Feb 26 22:15:17 aw kernel: nvidia 0000:01:00.0: firmware: direct-loading firmware nvidia/525.85.12/gsp_tu10x.bin
Feb 26 22:15:19 aw kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x0:1835)
Feb 26 22:15:19 aw kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
It takes about 10 seconds for glxinfo to conclude that info cannot be fetched. Likewise, it takes about 10 seconds to load gdm, and log into xorg session from gdm.
Since there is not much info which could point at where the issue is, can anyone point direction where one (who is not a graphics-/system dev) could look at?
I attempted to find something relevant in google, but everything I found so far was not related to open source kernel module, and always had some extra info before failure to open the device.
More info on my hardware setup is available in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1032003