The system is dead on a second resume

Several months ago NVIDIA finally fixed resume for my GPU (GTX 660) but unfortunately the fix wasn’t complete - the system can resume from sleep only once. When I resume more than once, I get a dead system:

NVRM: GPU at PCI:0000:01:00: GPU-136382c0-06fa-2c0f-977a-4f04b1755070
NVRM: Xid (PCI:0000:01:00): 56, CMDre 00000000 00000088 21010005 00000007 00000000
NVRM: Xid (PCI:0000:01:00): 56, CMDre 00000000 0000008c 00000000 00000005 0000102b
NVRM: Xid (PCI:0000:01:00): 56, CMDre 00000000 00000080 00000000 00000005 00001005
NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
SysRq : Emergency Sync
NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Emergency Sync complete
NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
SysRq : Emergency Sync
Emergency Sync complete
SysRq : Emergency Remount R/O

X.org log:

[ 86897.695] (EE)
[ 86897.695] (EE) Backtrace:
[ 86897.718] (EE) 0: /usr/bin/X (xorg_backtrace+0x59) [0x82045f9]
[ 86897.718] (EE) 1: /usr/bin/X (0x8048000+0x1c0996) [0x8208996]
[ 86897.718] (EE) 2: (vdso) (__kernel_rt_sigreturn+0x0) [0xb76fec94]
[ 86897.718] (EE) 3: /lib/libc.so.6 (0xb7172000+0x137918) [0xb72a9918]
[ 86897.718] (EE) 4: /usr/lib/xorg/modules/drivers/nvidia_drv.so (0xb39b0000+0xbdfa0) [0xb3a6dfa0]
[ 86897.718] (EE)
[ 86897.718] (EE) Segmentation fault at address 0xb76f5000
[ 86897.718] (EE)
Fatal server error:
[ 86897.718] (EE) Caught signal 11 (Segmentation fault). Server aborting
[ 86897.718] (EE)
[ 86897.718] (EE)

Vanilla kernel 3.18 i686 PAE with 16gigs of RAM here.

Bump

I also have this issue, please see here:

https://devtalk.nvidia.com/default/topic/803899/linux/suspen-to-ram-resume-failure-on-opensuse-13-2-using-gtx750/

Actually with newer drivers I can suspend and resume two or three times before the system dies.

[347031.778] (EE) Backtrace:
[347031.873] (EE) 0: /usr/bin/X (xorg_backtrace+0x59) [0x82045f9]
[347031.873] (EE) 1: /usr/bin/X (0x8048000+0x1c0996) [0x8208996]
[347031.873] (EE) 2: (vdso) (__kernel_rt_sigreturn+0x0) [0xb7725c94]
[347031.873] (EE) 3: /lib/libc.so.6 (0xb719a000+0x137718) [0xb72d1718]
[347031.873] (EE) 4: /usr/lib/xorg/modules/drivers/nvidia_drv.so (0xb395f000+0xa10a1) [0xb3a000a1]
[347031.873] (EE)
[347031.873] (EE) Segmentation fault at address 0xb771c000
[347031.873] (EE)
Fatal server error:
[347031.873] (EE) Caught signal 11 (Segmentation fault). Server aborting


Jan 27 10:37:41 localhost kernel: [346701.270650] NVRM: Xid (PCI:0000:01:00): 56, CMDre 00000000 00000088 21010007 00000007 00000000
Jan 27 10:37:41 localhost kernel: [346701.270662] NVRM: Xid (PCI:0000:01:00): 56, CMDre 00000000 0000008c 00000000 00000005 0000102b
Jan 27 10:37:44 localhost kernel: [346704.326465] NVRM: Xid (PCI:0000:01:00): 56, CMDre 00000000 00000080 00000000 00000005 00001005

Jan 27 10:37:54 localhost kernel: [346714.291252] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Jan 27 10:37:56 localhost kernel: [346716.289387] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Jan 27 10:38:08 localhost kernel: [346726.306614] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Drivers 346.35 are actually worse in this regard - they crash on a first resume. Darn!

Feb  3 13:01:53 localhost kernel: [143401.832053] NVRM: GPU at PCI:0000:01:00: GPU-136382c0-06fa-2c0f-977a-4f04b1755070
Feb  3 13:01:53 localhost kernel: [143401.832058] NVRM: Xid (PCI:0000:01:00): 56, CMDre 00000000 00000088 21010007 00000007 00000000
Feb  3 13:01:53 localhost kernel: [143401.832073] NVRM: Xid (PCI:0000:01:00): 56, CMDre 00000000 0000008c 00000000 00000005 0000102b
Feb  3 13:01:56 localhost kernel: [143404.910309] NVRM: Xid (PCI:0000:01:00): 56, CMDre 00000000 00000080 00000000 00000005 00001005
Feb  3 13:01:59 localhost kernel: [143407.961209] NVRM: Xid (PCI:0000:01:00): 32, Channel ID 00000001 intr 80044000
Feb  3 13:01:59 localhost kernel: [143407.961693] NVRM: Xid (PCI:0000:01:00): 32, Channel ID 00000001 intr 80004000
Feb  3 13:01:59 localhost kernel: [143407.969801] NVRM: Xid (PCI:0000:01:00): 32, Channel ID 00000001 intr 80004000
Feb  3 13:01:59 localhost kernel: [143407.970177] NVRM: Xid (PCI:0000:01:00): 32, Channel ID 00000001 intr 80004000
Feb  3 13:02:02 localhost kernel: [143410.992135] NVRM: Xid (PCI:0000:01:00): 56, CMDre 00000000 00000080 00000000 00000005 00001005
Feb  3 13:02:05 localhost kernel: [143413.990077] NVRM: Xid (PCI:0000:01:00): 56, CMDre 00000000 00000080 00000000 00000005 00001005

The below error message is repeated at least 25 times:

(**) NVIDIA(0): Using HorizSync/VertRefresh ranges from the EDID for display
(**) NVIDIA(0):     device LG Electronics 24MP55 (DFP-1) (Using EDID
(**) NVIDIA(0):     frequencies has been enabled on all display devices.)
(WW) NVIDIA(GPU-0): The EDID for LG Electronics 24MP55 (DFP-1) contradicts itself:
(WW) NVIDIA(GPU-0):     mode "1920x1080" is specified in the EDID; however, the
(WW) NVIDIA(GPU-0):     EDID's valid HorizSync range (30.000-83.000 kHz) would
(WW) NVIDIA(GPU-0):     exclude this mode's HorizSync (28.1 kHz); ignoring
(WW) NVIDIA(GPU-0):     HorizSync check for mode "1920x1080".
(WW) NVIDIA(GPU-0): The EDID for LG Electronics 24MP55 (DFP-1) contradicts itself:
(WW) NVIDIA(GPU-0):     mode "1920x1080" is specified in the EDID; however, the
(WW) NVIDIA(GPU-0):     EDID's valid VertRefresh range (56.000-61.000 Hz) would
(WW) NVIDIA(GPU-0):     exclude this mode's VertRefresh (50.0 Hz); ignoring
(WW) NVIDIA(GPU-0):     VertRefresh check for mode "1920x1080".
(WW) NVIDIA(GPU-0): The EDID for LG Electronics 24MP55 (DFP-1) contradicts itself:
(WW) NVIDIA(GPU-0):     mode "720x576" is specified in the EDID; however, the
(WW) NVIDIA(GPU-0):     EDID's valid VertRefresh range (56.000-61.000 Hz) would
(WW) NVIDIA(GPU-0):     exclude this mode's VertRefresh (50.0 Hz); ignoring
(WW) NVIDIA(GPU-0):     VertRefresh check for mode "720x576".
(WW) NVIDIA(GPU-0): The EDID for LG Electronics 24MP55 (DFP-1) contradicts itself:
(WW) NVIDIA(GPU-0):     mode "1920x1080" is specified in the EDID; however, the
(WW) NVIDIA(GPU-0):     EDID's valid VertRefresh range (56.000-61.000 Hz) would
(WW) NVIDIA(GPU-0):     exclude this mode's VertRefresh (50.0 Hz); ignoring
(WW) NVIDIA(GPU-0):     VertRefresh check for mode "1920x1080".
(WW) NVIDIA(GPU-0): The EDID for LG Electronics 24MP55 (DFP-1) contradicts itself:
(WW) NVIDIA(GPU-0):     mode "1280x720" is specified in the EDID; however, the
(WW) NVIDIA(GPU-0):     EDID's valid VertRefresh range (56.000-61.000 Hz) would
(WW) NVIDIA(GPU-0):     exclude this mode's VertRefresh (50.0 Hz); ignoring
(WW) NVIDIA(GPU-0):     VertRefresh check for mode "1280x720".
(**) NVIDIA(0): Using HorizSync/VertRefresh ranges from the EDID for display
(**) NVIDIA(0):     device LG Electronics 24MP55 (DFP-1) (Using EDID
(**) NVIDIA(0):     frequencies has been enabled on all display devices.)
(II) NVIDIA(0): Setting mode "DFP-1:nvidia-auto-select"
(EE) NVIDIA(GPU-0): Failed to initialize DMA.
(EE)  *** Aborting ***
(EE) NVIDIA(0): Error recovery failed.
(EE) NVIDIA(0):  *** Aborting ***
(EE) Fatal server error:
(EE) Failed to recover from error!
(EE) 
(EE) 
(EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
(EE) 
(WW) NVIDIA(0): WAIT (2, 8, 0x8000, 0x7eeae0a0, 0x0000000c)
(WW) NVIDIA(0): WAIT (1, 8, 0x8000, 0x7eeae0a0, 0x0000000c)

346.35 won’t allow to resume even once.

343.36 sometimes will allow to suspend/resume three times.

As for kernel 4.4 and NVIDIA drivers 358.16 this problem is resolved.

Now let’s pray future kernel/drivers updates won’t break it.

Thanks, NVIDIA!