Quadro M2200: RmInitAdapter failed! (0x25:0x65:1457)

I just upgraded a Lenovo P51 with a:

01:00.0 VGA compatible controller: NVIDIA Corporation GM206GLM [Quadro M2200 Mobile] (rev a1)

from Debian from 11 to 12 which upgrades the nvidia driver from 470.182.03 to 525.105.17. Upon reboot the screen goes black after some boot messages and I can’t even change into a virtual terminal. dmesg/syslog shows:

2023-06-13T01:32:27.315+00:00 vent kernel: NVRM: GPU 0000:01:00.0: 
RmInitAdapter failed! (0x25:0x65:1457)              
2023-06-13T01:32:27.315+00:00 vent kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0       
2023-06-13T01:32:27.315+00:00 vent systemd[1]: lightdm.service: Failed with result 'exit-code'.                        
2023-06-13T01:32:28.315+00:00 vent kernel: NVRM: GPU at PCI:0000:01:00: GPU-f2c01c85-c8a5-37ff-129f-eba0798a8837       
2023-06-13T01:32:28.315+00:00 vent kernel: NVRM: Xid (PCI:0000:01:00): 61, pid='<unknown>', name=<unknown>, 0a99(182c) 
00000000 00000000                                                                                                      

I tried with the drivers in experimental which is version 530.41.03 which fails in similar fashion:

2023-06-13T02:02:39.056+00:00 vent kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x25:0x65:1462)              
2023-06-13T02:02:39.056+00:00 vent kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0       
2023-06-13T02:02:39.056+00:00 vent systemd[1]: lightdm.service: Failed with result 'exit-code'.                        
2023-06-13T02:02:40.056+00:00 vent kernel: NVRM: GPU at PCI:0000:01:00: GPU-f2c01c85-c8a5-37ff-129f-eba0798a8837       
2023-06-13T02:02:40.056+00:00 vent kernel: NVRM: Xid (PCI:0000:01:00): 61, pid='<unknown>', name=<unknown>, 0a99(1804) 
00000000 00000000

This is with linux 6.1.0-9-amd64 but I also tried 5.10 which is what I using prior to the upgrade. No difference.

Upgraded to latest bios 1.60. Verified that display is still set to dedicated.

nvidia-bug-report.sh hangs so I interrupted it after a while with ctrl-c and attached what was written.

Last time I uploaded one of these files your forum marked it as a virus (it’s obviously not). Here is the startx output:

xauth:  file /root/.Xauthority does not exist

X.Org X Server 1.21.1.7
X Protocol Version 11, Revision 0
Current Operating System: Linux vent 6.1.0-9-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.27-1 (2023-05-08) x86_64
Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.1.0-9-amd64 root=UUID=fa6ff053-976a-434c-87ae-2422af36f535 ro nosplash quiet transparent_hugepage=madvise
xorg-server 2:21.1.7-3 (https://www.debian.org/support) 
Current version of pixman: 0.42.2
        Before reporting problems, check http://wiki.x.org
        to make sure that you have the latest version.
Markers: (--) probed, (**) from config file, (==) default setting,
        (++) from command line, (!!) notice, (II) informational,
        (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(==) Log file: "/var/log/Xorg.1.log", Time: Wed Jun 14 05:23:51 2023
(==) Using system config directory "/usr/share/X11/xorg.conf.d"
(EE) 
Fatal server error:
(EE) no screens found(EE) 
(EE) 
Please consult the The X.Org Foundation support 
         at http://wiki.x.org
 for help. 
(EE) Please also check the log file at "/var/log/Xorg.1.log" for additional information.
(EE) 

waiting for X server to begin accepting connections (EE) Server terminated with error (1). Closing log file.

xinit: giving up
xinit: unable to connect to X server: Connection reset by peer
xinit: server error

The GPU is supported per info supplied but I also tried but it made no difference:

options nvidia NVreg_OpenRmEnableUnsupportedGpus=1

nvidia-bug-report.log.gz (77.4 KB)

The only work-around that I have found is removing the nvidia driver and rely on nouveau which is not great.

README.txt refers to NVIDIA GPU fails to initialize on Red Hat Enterprise Linux 8 - Red Hat Customer Portal but that appears to be behind a paywall.

Does it work if you downgrade to driver 470?

Yes, downgrading, works.

This appears to a new (minor) error logged with the wrong level (II):

[     6.295] (II) NVIDIA(0): ACPI: failed to connect to the ACPI event daemon; the daemon
[     6.295] (II) NVIDIA(0):     may not be running or the "AcpidSocketPath" X
[     6.295] (II) NVIDIA(0):     configuration option may not be set correctly.  When the
[     6.295] (II) NVIDIA(0):     ACPI event daemon is available, the NVIDIA X driver will
[     6.295] (II) NVIDIA(0):     try to use it to receive ACPI event notifications.  For
[     6.295] (II) NVIDIA(0):     details, please see the "ConnectToAcpid" and
[     6.295] (II) NVIDIA(0):     "AcpidSocketPath" X configuration options in Appendix B: X

In case others need this here is what I did:

Modify /etc/apt/sources.list to add:

deb https://deb.debian.org/debian bullseye main contrib non-free

Then as root:

# apt update
# apt install nvidia-driver/bullseye nvidia-driver-libs/bullseye nvidia-driver-bin/bullseye xserver-xorg-video-nvidia/bullseye  nvidia-vdpau-driver/bullseye nvidia-kernel-dkms/bullseye  libgl1-nvidia-glvnd-glx/bullseye nvidia-egl-icd/bullseye libnvidia-eglcore/bullseye libglx-nvidia0/bullseye nvidia-alternative/bullseye libnvidia-glcore/bullseye nvidia-alternative/bullseye  nvidia-kernel-support/bullseye libxnvctrl0/bullseye  nvidia-persistenced/bullseye nvidia-settings/bullseye libnvidia-cfg1/bullseye
# apt-mark hold nvidia-driver nvidia-persistenced libxnvctrl0 glx-alternative-mesa glx-alternative-nvidia glx-diversions update-glx

Reboot.

This suggest a driver issue, no? Is there anything else I can do to have resolve this on current drivers? M2200 is documented as supported.

Yes, seems to be a driver bug specific to your notebook model. You also already run the latest system bios.
You might try to install drivers 530/535 to check whether this has been fixed already and mail your report to linux-bugs[at]nvidia.com for some attention.

1 Like

I emailed linux-bugs@ on Jun 14 and Jun 16, 2023. No response. Adfar Banday tells me via chat that the only escalation option is to ask for an update here.

Update?

@allanwind
I will try to find similar notebook internally to replicate issue locally which will help us to debug issue.
However, we are trying to analyze log shared by you and will see if it leads us to root cause.

Thank you so much. Let me know if there is anything I can do to help (allan@yaxto.com if email works better for you than forum). This is my primary laptop for work so I have vary of using a non-Debian packaged versions of the drivers.

We were not able to root cause exact issue, hence requesting few more logs for the same.
Please collect fresh bug report from repro state along with output of “nvidia-debugdump -D output” and share with us.
Also, please fallback to older 470 branch driver and share bug report from no repro state.

no repro bug report:

nvidia-bug-report.log.gz (338.0 KB)

It will take me a little time to upgrade and supply the repeated repo bug report and nvdia-debugdump -D output.

@allanwind
From the attached bug report, it shows that nvidia-smi is not working properly.
Skipping nvidia-smi output (nvidia-smi not found)

On non-repro installed nvidia-smi and re-ran nvidia-bug-report.sh:

nvidia-bug-report.log.gz (1.0 MB)

Thanks, shall wait for logs and nvdia-debugdump -D output from repro state.

As promised here is the data from repro (525.105.17-1). It left a temp file around so included that for good measure, too:

nvidia-bug-report.log.gz (441.4 KB)
nvidia-debugdump.zip (347.1 KB)
nvidia-nvml-temp2325.log (326 Bytes)

Update?

Thanks for sharing the logs. I have filed a bug 4197038 internally for tracking purpose.
However, shared logs are not sufficient enough to confirm where exactly issue is happening.
So I am trying to look for similar notebook in our premise and attempt for local repro.
This will help us to debug issue further.

Thanks for the update. Please let me know if there is anything else I can do to advance this.

Update?

Debian 12.1 was released today which appears to have resolved the issue. Here are the relevant version change and changelog in case it helps:

linux-image-6.1.0-10-amd64:amd64 (6.1.37-1, 6.1.38-1) changelog
nvidia-driver:amd64 (525.105.17-1, 525.125.06-1~deb12u1) changelog

I submitted this issue on June 12 and today is July 22 and sadly received no end-user support.

Apologize @allanwind for not helping much on it due to delay in getting hold of same notebook model and issue was specific to hardware.
But I am glad to hear that you are no longer seeing this issue.