340.32 and 340.24 lock up on thaw (after hibernate), 319.32 is fine

Hi there,

Quick summry of this email

X server locks on thaw after hibernation with 340.32 and 340.24 drives
but does not with 319.32 drives.

The symptom is a black screen - unresponsive to virtual terminal keys
etc. The X server is chewing 100% CPU

distribution kubuntu 12.04
kernel 3.2.45
nvidia-drivers - 319.32 good - 340.24 and 340.32 bad
platforms - various - Dell’s and HP’s
nvidia cards 780’s and Quadro 5000’s

I’ll attach (when I find the attach function!) two nvidia bug report log files - one before I attempt a
hibernate and one after the thaw when Xorg is eating 100% of the CPU.

Long story

We are doing a lot of hibernation and thawing (40 machines a night at
the moment) and have found the 340.32 (and 24) drivers almost always
hang on thaw. Even if there are no users logged in and kdm is the only
thing running on the X server.

On the same hardware and OS the 319.32 driver works perfectly.

I’ve seen the problem on Dell and HP workstations.

A user doesn’t even have to be logged in for it to happen.

When the machine thaws the screen is black and Xorg is eating 100% CPU.

If I attach gdb to X and dump the stack I get:

(gdb) where
#0  0x00007ffbfe2722c8 in __GI___poll (fds=0x7fffcf8103f0, nfds=1, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:83
#1  0x00007ffbf8432720 in ?? () from /usr/lib/xorg/modules/drivers/nvidia_drv.so
#2  0x00007ffbf84c2ba4 in ?? () from /usr/lib/xorg/modules/drivers/nvidia_drv.so
#3  0x00007ffbf84c2e62 in ?? () from /usr/lib/xorg/modules/drivers/nvidia_drv.so
#4  0x00007ffbf84b3898 in ?? () from /usr/lib/xorg/modules/drivers/nvidia_drv.so
#5  0x00007ffbf84bcb48 in ?? () from /usr/lib/xorg/modules/drivers/nvidia_drv.so
#6  0x00007ffbf8469044 in ?? () from /usr/lib/xorg/modules/drivers/nvidia_drv.so
#7  0x00007ffbf8806d6e in ?? () from /usr/lib/xorg/modules/drivers/nvidia_drv.so
#8  0x00007ffbf87fa994 in ?? () from /usr/lib/xorg/modules/drivers/nvidia_drv.so
#9  0x00007ffc0008d7eb in ?? ()
#10 0x00007ffc0009a845 in ?? ()
#11 0x00007ffc0007f918 in xf86Wakeup ()
#12 0x00007ffc000467eb in WakeupHandler ()
#13 0x00007ffc00179c76 in WaitForSomething ()
#14 0x00007ffc000425f2 in ?? ()
#15 0x00007ffc000317ba in ?? ()
#16 0x00007ffbfe1ab76d in __libc_start_main (main=0x7ffc00031420, argc=8, ubp_av=0x7fffcf811118, init=<optimized out>, fini=<optimized out>, 
    rtld_fini=<optimized out>, stack_end=0x7fffcf811108) at libc-start.c:226
#17 0x00007ffc00031aad in _start ()
(gdb)

The 343 drivers won’t start up. I’ll make a separate report for that.

[This file was removed because it was flagged as potentially malicious] (246 KB)
x-server-ok-before-hibernate-nvidia-bug-report.log.gz (242 KB)

I’ve restarted this project. I have not had a single problem like this when I unbound all USB devices before hibernating. No more problems with the X server chewing 100% CPU.

Essentially I used the script from here

http://thecodecentral.com/2011/01/18/fix-ubuntu-10-10-suspendhibernate-not-working-bug

I did have one hang - but I had forgotten to install the script to unbind the devices. I’ve never had a hang yet (I have 10 machines that are hibernating nightly) when I correctly installed the script.

I’m not 100% sure the script is the fix - I haven’t done controlled regression tests - but just in case someone is having trouble - maybe it will fix them.

Another possibility is of course the 340.46 driver we are running now.