[840M on Asus UX303LN] Application/driver freeze on resume from suspend with optimus

Hi,

When I start an application with optirun and/or primus and suspend my laptop, the application is frozen on resume. From what I gather of dmesg, the driver doesn’t seem to be able to wake up the card or restore its state properly.

Steps to reproduce:
That’s the easy part… run glxspheres (32bit or 64bit) through optirun, suspend, resume and voilà ! glxspheres should be frozen. The rest of the system works fine. Sometimes restarting the opengl app will work, sometimes a full system restart is needed.

This works with any OpenGL application that I try to run with optirun. it’s been happening since at least septembre (might be older.)

So far, the most useful logs I can produce are an extract of my dmesg with the whole suspend/resume process. File is attached. You should notice the mess here:

[ 987.402372] NVRM: GPU at PCI:0000:03:00: GPU-c3350c76-8707-abd9-a985-52814992bd10
[ 987.402380] NVRM: Xid (PCI:0000:03:00): 13, Graphics Exception: Shader Program Header 1 Error
[ 987.402442] NVRM: Xid (PCI:0000:03:00): 13, Graphics Exception: Shader Program Header 2 Error
[ 987.402493] NVRM: Xid (PCI:0000:03:00): 13, Graphics Exception: Shader Program Header 3 Error
[ 987.402545] NVRM: Xid (PCI:0000:03:00): 13, Graphics Exception: Shader Program Header 9 Error
[ 987.402596] NVRM: Xid (PCI:0000:03:00): 13, Graphics Exception: Shader Program Header 18 Error
[ 987.402648] NVRM: Xid (PCI:0000:03:00): 13, Graphics Exception: ESR 0x405840=0xa204020e
[ 987.402727] NVRM: Xid (PCI:0000:03:00): 13, Graphics Exception: ChID 0010, Class 0000b097, Offset 00001644, Data 00000001

I did check that under Windows 10 , the crash does not happen (not a hardware issue).

Additional info:

  • Distribution is Arch Linux
    Package version(s):
  • bumblebee 3.2.1-10
  • primus 20151110-1
  • nvidia 361.28-1
  • xorg 1.18.1-3
  • kernel 4.4.1-2-ARCH

Hardware:

  • laptop is an ASUS UX303LN with an Nvidia GeForce 840M graphics card + an Intel Corporation Haswell-ULT Integrated Graphics Controller.

Anyone seeing this ? Anyone has hints ?

Additionnal note: I’ve opened a bug on the bumblebee issue tracker on github, just in case. Could be driver of bumblebee issue.

I’ll attach an nvidia-bug-report log. looks more and more like a bad driver bug.

I’l bee investigating the suspend process using the instructions here: https://01.org/blogs/rzhang/2015/best-practice-debug-linux-suspend/hibernate-issues and it looks like while bumblebee does it’s job of loading and unloading the module on application start and stop, while the driver does indeed accept the call to suspend and to resume in a timely fashion, I still get some very bad looking Xid messages in the kernel logs and the usual application freeze.
nvidia-bug-report.log.gz (152 KB)

Ok, I’ll add a new report to this “case” (in fact a forum post which feels really lonely).

This report was generated after the following procedure:

  • start an application with optirun (glxspheres64)
  • suspend to RAM
  • resume the system
    ----> nvidia report capture happens here
  • Kill the frozen app

Note that the report was generated with those 2 kernel command line option for more debug information on suspend stuff: initcall_debug no_console_suspend

Also, I’ve been using the following configuration for bumblebee (just being thorough even if the bug isn’t on bumblebee’s side IMHO):

[bumblebeed]
VirtualDisplay=:8
KeepUnusedXServer=false
ServerGroup=bumblebee
TurnCardOffAtExit=false
NoEcoModeOverride=false
Driver=nvidia
XorgConfDir=/etc/bumblebee/xorg.conf.d

[optirun]
Bridge=primus
VGLTransport=yuv
PrimusLibraryPath=/usr/lib/primus:/usr/lib32/primus
AllowFallbackToIGC=false

[driver-nvidia]
KernelDriver=nvidia
PMMethod=bbswitch
LibraryPath=/usr/lib/nvidia:/usr/lib32/nvidia
XorgModulePath=/usr/lib/nvidia/xorg/,/usr/lib/xorg/modules
XorgConfFile=/etc/bumblebee/xorg.conf.nvidia

[driver-nouveau]
KernelDriver=nouveau
PMMethod=auto
XorgConfFile=/etc/bumblebee/xorg.conf.nouveau
nvidia-bug-report-with-app-frozen.log.gz (171 KB)

Seriously ? Nothing ? Not even a “Post more logs dude” ?

Is this issue reproduce with nvida PRIME configuration ? ftp://download.nvidia.com/XFree86/Linux-x86_64/361.28/README/randr14.html . you can test with both Intel as well as Modesetting drivers.

I’ll give it a try asap. Thanks!

Ok, so TL;DR suspend and resume works when I avoid using bumblebee.

On the other hand, I get gnome-shell crashes when starting it through GDM and I get a blackscreen of near-death when pluging in a screen on the HDMI port (gets back to normal when I unplug the monitor).

I’ll try to report the gnome-shell crash upstream. I’ll probably get the usual “stop using a proprietary driver” kind of response.

As for the secondary monitor issue, I have no idea what I can do.

Suggestions ?

Cheers,
Marc.

Ok, both issues seem to be gnome related. For once, I’ve found an ubuntu bug report that actually accurately describes the issue. An accurate bug report on ubuntu means the end of the world is near, or that winter is coming :)

–> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1559576

I’ll wait for a gnome fix and report back when I have news.

Oh, @sandipt, what do you think causes the driver crash on resume ? Bumblebee ? nvidia driver issue ? misuse of something ? I’ve got an ongoing bug report with bumblebee and I’d like to be able to give them some feedback on my findings.

Cheers,
Marc.

ok, more news:

  • I get screen corruption when resuming from suspend (I can make a screenshot if needed)
  • even when not using GDM which seems unable to start x properly when not using bumblebee, gnome-shell still gives me a black screen when pluging in a monitor on either HDMI or DP.

So, I can play games and suspend, but not much else…