Problem with a quadro m3000m, Xorg 1.20, and black screen

I have a thinkpad p70 with a quadro m3000m that hangs X whenever I start it up. I’ve tried multiple drivers with various xorg.conf with no luck. I’ve attached the bug report. Hopefully someone has figured out this issue.
nvidia-bug-report.log.gz (37.1 KB)
nvidia-bug-report.log.old.gz (40.7 KB)

I’m a bit puzzled by the logs you provided, the Xorg.0.log from the .old archive shows that everything is working fine, the Xorg.0.log from the newer archive shows a freeze at loading libglx.so after restarting the Xserver.
What happened after the first start, what kind of DE are you using?

Hi,

Thanks for replying. I’m at a loss on it too as I didn’t see anything that would stop the system from working in the logs, but the same thing happens every time; a black screen and the keyboard goes unresponsive as in I can’t kill X. Pressing the caps and numlock buttons don’t even light up. I had to write a short script that starts X and then runs nvidia-bug-report.

I’m using lightdm with xfce4. Same result if I attempt to start X manually or run any other login manager; black screen. It appears to be an issue with think pad model p70. Searching this site revealed one other report with the same result; https://devtalk.nvidia.com/default/topic/1024815/linux/x-org-1-18-1-19-hang-on-start-on-lenovo-p70-quadro-m4000m-with-discrete-graphics-only-enabled-in-bios/post/5228664/

Ok, then I can make some sense out of the logs. Looks like your script should have waited at least half a minute between starting X and running nvidia-bug-report.sh. You caught the Xserver in mid-air so the log looks truncated.
Please delete your xorg.conf and insert a
sleep 30
in your script and attach the newly created log.
The .old logs are telling that the Xserver isn’t actually frozen but somehow there seems to be no inpu/output.
Did you already update your bios and see if you can configure it for hybrid graphics mode?

Hi,

I’ve tried it without an xorg.conf before with the same result. The bios is up to date and yes, I can configure hybrid graphics, but I’d like to avoid using the intel gpu all together. The same setup works fine in windows. Leaving the system running longer than ~15 seconds with the black screen results in the system being locked up. One can’t ssh into it and it won’t run anything.

Hardware diagnostics give the system a clean bill of health. Running a live image using nouveau gives me a working desktop, but nouveau isn’t really useful to me.

No one has any ideas? I’ve re-run the bug report utility again this morning and got a much bigger and more detailed report. Maybe this will help.
nvidia-bug-report.log.gz (421 KB)

Problem is, the logs you’re providing are a snapshot from when the system is still working. So no errors are visible. Furthermore, your system logger is not set up for persistent logs, e.g. /var/log/messages for kernel messages. At least nothing is included. So to debug this further, you’ll have to set up you system logger for this. Or does /var/log/messages exist is your system and contains the logs from previous boots?
When you have correctly set this up, you can let it crash and provide the logs after reboot.
Second, you’re running Gentoo so you have a self-configured kernel which might be missing things. Did you already try a standard Ubntu install to rule this out?

After checking the report, grabbing a few missing things such as xset, and updating to the latest nvidia driver, then running the bug report again, I’m providing the latest report.

Hopefully this helps.
nvidia-bug-report.log.gz (1.08 MB)

/var/log exists on the system. The previous logs look identical with no errors except for the logitech usb receiver messages and time stamps. I’ve tried a default linux mint before just to rule that out. Same issue. A search reveals that this is definitely a problem with the nvidia binary and thinkpad p70s when switched over to dedicated graphics instead of hybrid using quadros above the m600 gpu. I’ve followed threads on other sites which all end the same; the user gives up. I’m trying here with the full knowledge of at least one other thread on this site which ended the same way. Hopefully, someone will take this seriously.

Thank you for trying to help.

EDIT: I just installed the vulkan-loader and will rerun the script again. I will also upload my kernel .config which works for other systems running nvidia cards and the proprietary drivers. It also worked fine for this laptop when it had the quadro m600m card installed.

Here is the short script I’ve been running too;

#!/bin/bash
startxfce4 &
sleep 15 &
nvidia-bug-report.sh &

okay, here are the last two logs. I ran the script again as is, then changed it to startx and sleep 30. The kernel config is also being uploaded. What else is helpful? If needed, I can install linux-mint on another HDD and show the output from that as well. Getting this solved without reinstalling the quadro m600m would be great :D

Thanks again.
nvidia-bug-report.log.old.gz (908 KB)
nvidia-bug-report.log.gz (1020 KB)
config.zip (22.4 KB)

Ok, the last log brought something up, not much but I don’t think you’ll ever get more out of it.
The gpu’s display enigine seems to be failing on setting the mode so the X driver freezes.

  • Xorg log ends after NVIDIA(0): Setting mode “DFP-2:nvidia-auto-select”
  • the kernel is reallocating resorces, looks like a gpu reset: caller _nv000934rm+0x1bf/0x1f0 [nvidia] mapping multiple BARs
  • nvidia-smi reports an inactive display: Display Active : Disabled

This is a rare but not unknown failure.
Among other things, this can be caused by bad display firmware which would explain why this model is affected in general. Unfortunately, the nvidia linux driver is a bit touchy about these things in contrast to the windows driver.
To rule things out, you could connect an external monitor and then create an xorg.conf using either the ConnectedMonitor or UseDisplayDevice option to only enable the external monitor and disable the internal display. https://download.nvidia.com/XFree86/Linux-x86_64/384.98/README/xconfigoptions.html
To get the internal display working, you might try switching to efi boot and set the efifb resolution to the native resolution of 1920x1080. TBH, outlook is bad.

I had already tried the external monitor with no success in the past. This step is where I’ve been left at which is why I’m posting this here after messing with this off and on for months. Not sure what difference it would make trying UEFI at this point, but it’s worth trying I guess.

Perhaps aplattner or one of the others will see this thread and have some suggestions or attempt to fix the bug.

Okay, so I’m back to messing around with this again using the latest blob in the repos which is 418.56. I’ve tried without an xorg.conf because someone in the #nvidia channel on freenode was saying that it no longer needs an .conf to work and with an .conf generated by nvidia-settings. Same result as previous attempts. Maybe someone that works on the nvidia blob can take a look this error report?

Thanks

EDIT appended a second bug report.
nvidia-bug-report.log.gz (47.6 KB)
nvidia-bug-report.log.gz (45.1 KB)