X.org 1.18/1.19 hang on start on Lenovo P70/Quadro M4000M with discrete graphics only enabled in BIOS

OpenSUSE Leap 4.2.3, kernel 4.4.87-25-default (current stock), nVidia driver 384.90. In case it matters, 32 GB RAM; OS is on a Crucial 1TB SATA SSD.

The X server hangs on start if discrete graphics is enabled in the system BIOS (2.17). If hybrid graphics is enabled, X starts up normally and the M4000M can be used with either Optimus or Primus, but with those configurations OpenGL performance is rather less than what I would expect (better with xorg-x11-server 1.19, but still well below a 7 year old laptop with an AMD Radeon HD5870 using the FOSS driver).

If I boot to run level 3 (multiuser without X), the screen goes black upon exit from Grub and I never get a login prompt. However, I can access the system remotely.

nvidia-bug-report.log.gz will be attached separately (I hope!)
nvidia-bug-report.log.gz (72.9 KB)

Sounds very similar to what I’ve seen from my Sager NP8658S (same as Clevo P650-RG). Every driver after 364.19 will only give me a black screen when starting X if I select Discrete graphics in the BIOS/UEFI, though they somewhat work in MSHybrid mode with other bugs and performance quirks.

That’s a bit messy. I think the log is from when you switched to discrete only. But the GLX/OpenGL is still switched to Mesa, so it crashes. You will have to reinstall the Nvidia drivers in that case.
Your CPU seems to be a skylake, I don’t think the iGPU of that works well with Leap’s 4.4 kernel. I think some backport modules exist. Other wise try newer kernel and the needed firmware.
Then there’s a performance issue hitting mobile Quadros:
The workaround for that is to set nvidia-drm.modeset=1 as kernel parameter.

Look at the last part of the log, where the discrete GPU locked up and dropped off the bus.

The CPU is indeed a skylake. I know the integrated GPU doesn’t work with base 4.4, but openSUSE has patched it extensively and the iGPU works fine.

I’ll try the nvidia-drm.modeset=1 parameter.

I don’t think the sluggish performance reported in that bug is what’s affecting me. I’m not seeing pathological slowness, just OpenGL performance that is considerably lower than I’d expect from a fairly high end adapter. And the display is actually running entirely off the Intel GPU; with Primus or Optimus, it’s my understanding that the discrete GPU is just used for rendering.

modesetting makes no difference here.

At the start, nvidia-modeset module is complaining about missing symbols. You should really purge and reinstall the driver.

I don’t see any complaints about missing symbols in the log I posted?

[ 3.054881] nvidia_modeset: module license ‘NVIDIA’ taints kernel.
[ 3.054897] Disabling lock debugging due to kernel taint
[ 3.054937] xhci_hcd 0000:00:14.0: hcc params 0x200077c1 hci version 0x100 quirks 0x00109810
[ 3.054997] xhci_hcd 0000:00:14.0: cache line size of 64 is not supported
[ 3.055076] nvidia_modeset: Unknown symbol nvidia_register_module (err 0)
[ 3.055082] nvidia_modeset: Unknown symbol nv_kthread_q_schedule_q_item (err 0)
[ 3.055091] nvidia_modeset: Unknown symbol nvidia_get_rm_ops (err 0)
[ 3.055100] nvidia_modeset: Unknown symbol nv_kthread_q_item_init (err 0)
[ 3.055103] nvidia_modeset: Unknown symbol nv_kthread_q_stop (err 0)
[ 3.055106] nvidia_modeset: Unknown symbol nvidia_unregister_module (err 0)
[ 3.055114] nvidia_modeset: Unknown symbol nv_kthread_q_init (err 0)

OK, those errors went away after reinstall, but same behavior otherwise.
nvidia-bug-report.log.gz (72.7 KB)

X still loads Mesa. It seems to me that opensuse is not really the best choice for Optimus laptops. They recommend using bumblebee but then you shouldn’t use the bios switch. Currently, the Nvidia glx is disabled
c/p from opensuse forum:

We’re getting a bit off topic here. The issue is inability to run on the discrete GPU alone, because of the X server hanging. Is that due to bumblebee or GLX? If so, I can try remove all of that stuff.

With the nVidia driver and the bumblebee repo, I’m able to get everything working in hybrid mode. I’d like to be able to use just the discrete GPU.

Bumblebee is blocking nvidia glx so you can use nothing but bumblebee.

I see. So suggestion is to remove all of bumblebee?

Yes, and hope the nvidia driver install will be reverted to the state it was before bumblebee.

OK, now we’re getting somewhere. The X server is not hanging. However, the screen remains blank (but with the backlight on). So it’s not usable, but it’s still different behavior.

Also, the screen is active during the boot sequence (in console mode).
nvidia-bug-report.log.gz (258 KB)

Ok, now the nvidia driver is back to a working state, but there’s obviously a driver bug, on setting the mode it throws error XID 56
Can you try now to set the kernel parameter
to see if it’s a workaround?

No; it actually results in no display at all during boot.

It also looks like the X server can’t be killed (with or without nvidia-drm.modeset=1) after it starts.

nvidia-bug-report.log.gz (238 KB)

I spoke a bit too soon. It looks like the X server is (re)starting in a loop this time.
nvidia-bug-report.log.gz (261 KB)

I also tried removing all of the nouveau bits (except for the libdrm stuff, which had some dependency issues); no change in behavior.