(SOLVED) resume from suspend not working with 980 Ti, drivers 352 - 370, kernels 3.16 - 4.4

I have a system built on Asus z170m-plus with an EVGA 980 ti hybrid.
The system boots without any (to me visible) problems.
It was built with the purpose of GPU computing, and does not show any problems under load.
It crashes maybe 50% of the times when resuming from suspend.
When crashing on resume, the screen turns on but stays black.
ssh into the machine often, but not always, work, and shows an Xorg process at 100% cpu.

I have run both Linux Mint 17.3 and Fedora on it with kernels from 3.x up to 4.4 and cycled through Nvidia drivers 350.xx to 370 and still see this problem.
I can’t try nouveau as the card is not supported there yet.

If I don’t let the machine suspend, I have not seen this problem.

Will follow up with bug logs after next crash.

The log is for the system just after boot.
nvidia-bug-report.log.gz (228 KB)

I was not able to get more than a few seconds ssh before the system became unresponsive.

Linux Mint /var/log/syslog:
Feb 29 11:41:39 roxy anacron[3323]: Anacron 2.3 started on 2016-02-29
Feb 29 11:41:39 roxy anacron[3323]: Normal exit (0 jobs run)
Feb 29 11:41:46 roxy kernel: [12060.269190] PM: Syncing filesystems … done.
Feb 29 11:41:46 roxy kernel: [12060.276261] PM: Preparing system for sleep (mem)
Feb 29 11:41:46 roxy kernel: [12060.276449] Freezing user space processes … (elapsed 0.001 seconds) done.
Feb 29 11:41:46 roxy kernel: [12060.277901] Freezing remaining freezable tasks … (elapsed 0.001 seconds) done.
Feb 29 11:41:46 roxy kernel: [12060.278986] PM: Suspending system (mem)
Feb 29 11:41:46 roxy kernel: [12060.278998] Suspending console(s) (use no_console_suspend to debug)
Feb 29 11:41:46 roxy kernel: [12060.279135] wlan0: deauthenticating from ac:9e:17:e9:82:c4 by local choice (Reason: 3=DEAUTH_LEAVING)
Feb 29 11:41:46 roxy kernel: [12060.337144] serial 00:02: disabled
Feb 29 11:41:46 roxy kernel: [12060.337146] serial 00:02: System wakeup disabled by ACPI
Feb 29 11:41:46 roxy kernel: [12060.337291] parport_pc 00:01: disabled
Feb 29 11:41:46 roxy kernel: [12060.337413] e1000e: EEE TX LPI TIMER: 00000011
Feb 29 11:41:46 roxy kernel: [12060.545664] Trying to free nonexistent resource <000000000000c000-000000000000c0ff>
Feb 29 11:41:46 roxy kernel: [12060.618325] PM: suspend of devices complete after 338.607 msecs
Feb 29 11:41:46 roxy kernel: [12060.618949] PM: late suspend of devices complete after 0.621 msecs
Feb 29 11:41:46 roxy kernel: [12060.619331] e1000e 0000:00:1f.6: System wakeup enabled by ACPI
Feb 29 11:41:46 roxy kernel: [12060.619527] xhci_hcd 0000:00:14.0: System wakeup enabled by ACPI
Feb 29 11:41:46 roxy kernel: [12060.634753] PM: noirq suspend of devices complete after 15.775 msecs
Feb 29 11:41:46 roxy kernel: [12060.634992] ACPI: Preparing to enter system sleep state S3
Feb 29 11:41:46 roxy kernel: [12060.838394] PM: Saving platform NVS memory
Feb 29 11:41:46 roxy kernel: [12060.838406] Disabling non-boot CPUs …
Feb 29 11:41:46 roxy kernel: [12060.839620] smpboot: CPU 1 is now offline
Feb 29 11:41:46 roxy kernel: [12060.863654] smpboot: CPU 2 is now offline
Feb 29 11:41:46 roxy kernel: [12060.879660] smpboot: CPU 3 is now offline
Feb 29 11:41:46 roxy kernel: [12060.895696] smpboot: CPU 4 is now offline
Feb 29 11:41:46 roxy kernel: [12060.915685] smpboot: CPU 5 is now offline
Feb 29 11:41:46 roxy kernel: [12060.931664] smpboot: CPU 6 is now offline
Feb 29 11:41:46 roxy kernel: [12060.950638] Broke affinity for irq 19
Feb 29 11:41:46 roxy kernel: [12060.950639] Broke affinity for irq 121
Feb 29 11:41:46 roxy kernel: [12060.951652] smpboot: CPU 7 is now offline
Feb 29 11:41:46 roxy kernel: [12060.970569] ACPI: Low-level resume complete
Feb 29 11:41:46 roxy kernel: [12060.970673] PM: Restoring platform NVS memory
Feb 29 11:41:46 roxy kernel: [12060.971536] Enabling non-boot CPUs …
Feb 29 11:41:46 roxy kernel: [12060.971591] x86: Booting SMP configuration:
Feb 29 11:41:46 roxy kernel: [12060.971592] smpboot: Booting Node 0 Processor 1 APIC 0x2
Feb 29 11:41:46 roxy kernel: [12060.987467] cache: parent cpu1 should not be sleeping
Feb 29 11:41:46 roxy kernel: [12060.987557] CPU1 is up
Feb 29 11:41:46 roxy kernel: [12060.987574] smpboot: Booting Node 0 Processor 2 APIC 0x4
Feb 29 11:41:46 roxy kernel: [12061.003514] cache: parent cpu2 should not be sleeping
Feb 29 11:41:46 roxy kernel: [12061.003574] CPU2 is up
Feb 29 11:41:46 roxy kernel: [12061.003584] smpboot: Booting Node 0 Processor 3 APIC 0x6
Feb 29 11:41:46 roxy kernel: [12061.015579] cache: parent cpu3 should not be sleeping
Feb 29 11:41:46 roxy kernel: [12061.015636] CPU3 is up
Feb 29 11:41:46 roxy kernel: [12061.015644] smpboot: Booting Node 0 Processor 4 APIC 0x1
Feb 29 11:41:46 roxy kernel: [12061.031632] cache: parent cpu4 should not be sleeping
Feb 29 11:41:46 roxy kernel: [12061.031691] CPU4 is up
Feb 29 11:41:46 roxy kernel: [12061.031701] smpboot: Booting Node 0 Processor 5 APIC 0x3
Feb 29 11:41:46 roxy kernel: [12061.047683] cache: parent cpu5 should not be sleeping
Feb 29 11:41:46 roxy kernel: [12061.047740] CPU5 is up
Feb 29 11:41:46 roxy kernel: [12061.047762] smpboot: Booting Node 0 Processor 6 APIC 0x5
Feb 29 11:41:46 roxy kernel: [12061.059693] cache: parent cpu6 should not be sleeping
Feb 29 11:41:46 roxy kernel: [12061.059750] CPU6 is up
Feb 29 11:41:46 roxy kernel: [12061.059758] smpboot: Booting Node 0 Processor 7 APIC 0x7
Feb 29 11:41:46 roxy kernel: [12061.075744] cache: parent cpu7 should not be sleeping
Feb 29 11:41:46 roxy kernel: [12061.075801] CPU7 is up
Feb 29 11:41:46 roxy kernel: [12061.081867] ACPI: Waking up from system sleep state S3
Feb 29 11:41:46 roxy kernel: [12061.099709] xhci_hcd 0000:00:14.0: System wakeup disabled by ACPI
Feb 29 11:41:46 roxy kernel: [12061.099828] PM: noirq resume of devices complete after 15.207 msecs
Feb 29 11:41:46 roxy kernel: [12061.100170] PM: early resume of devices complete after 0.298 msecs
Feb 29 11:41:46 roxy kernel: [12061.100223] e1000e 0000:00:1f.6: System wakeup disabled by ACPI
Feb 29 11:41:46 roxy kernel: [12061.104010] parport_pc 00:01: activated
Feb 29 11:41:46 roxy kernel: [12061.105415] serial 00:02: activated
Feb 29 11:41:46 roxy kernel: [12061.105417] rtc_cmos 00:05: System wakeup disabled by ACPI
Feb 29 11:41:46 roxy kernel: [12061.183633] xhci_hcd 0000:00:14.0: port 12 resume PLC timeout
Feb 29 11:41:46 roxy kernel: [12061.408186] usb 1-9: reset high-speed USB device number 2 using xhci_hcd
Feb 29 11:41:46 roxy kernel: [12061.648578] usb 1-13: reset full-speed USB device number 5 using xhci_hcd
Feb 29 11:41:46 roxy kernel: [12061.672431] ata4: SATA link down (SStatus 4 SControl 300)
Feb 29 11:41:46 roxy kernel: [12061.672485] ata5: SATA link down (SStatus 4 SControl 300)
Feb 29 11:41:46 roxy kernel: [12061.672528] ata1: SATA link down (SStatus 4 SControl 300)
Feb 29 11:41:46 roxy kernel: [12061.672556] ata3: SATA link down (SStatus 4 SControl 300)
Feb 29 11:41:46 roxy kernel: [12061.680502] ata6: SATA link down (SStatus 4 SControl 300)
Feb 29 11:41:46 roxy kernel: [12061.680531] ata2: SATA link down (SStatus 4 SControl 300)
Feb 29 11:41:46 roxy kernel: [12061.849087] usb 1-9.1: reset full-speed USB device number 4 using xhci_hcd
Feb 29 11:41:46 roxy kernel: [12062.009383] usb 1-9.4: reset high-speed USB device number 7 using xhci_hcd
Feb 29 11:41:46 roxy kernel: [12062.169653] usb 1-9.3: reset full-speed USB device number 6 using xhci_hcd
Feb 29 11:41:46 roxy kernel: [12062.417951] PM: resume of devices complete after 1315.464 msecs
Feb 29 11:41:46 roxy kernel: [12062.418141] PM: Finishing wakeup.
Feb 29 11:41:46 roxy kernel: [12062.418142] Restarting tasks …
Feb 29 11:41:46 roxy kernel: [12062.418498] pci_bus 0000:04: Allocating resources
Feb 29 11:41:46 roxy kernel: [12062.418516] pci 0000:03:00.0: bridge window [mem 0x00100000-0x000fffff 64bit pref] to [bus 04] add_size 200000 add_align 100000
Feb 29 11:41:46 roxy kernel: [12062.418518] pci 0000:03:00.0: bridge window [mem 0x00100000-0x000fffff] to [bus 04] add_size 200000 add_align 100000
Feb 29 11:41:46 roxy kernel: [12062.418521] pci 0000:03:00.0: res[14]=[mem 0x00100000-0x000fffff] res_to_dev_res add_size 200000 min_align 100000
Feb 29 11:41:46 roxy kernel: [12062.418523] pci 0000:03:00.0: res[14]=[mem 0x00100000-0x002fffff] res_to_dev_res add_size 200000 min_align 100000
Feb 29 11:41:46 roxy kernel: [12062.418527] pci 0000:03:00.0: res[15]=[mem 0x00100000-0x000fffff 64bit pref] res_to_dev_res add_size 200000 min_align 100000
Feb 29 11:41:46 roxy kernel: [12062.418530] pci 0000:03:00.0: res[15]=[mem 0x00100000-0x002fffff 64bit pref] res_to_dev_res add_size 200000 min_align 100000
Feb 29 11:41:46 roxy kernel: [12062.418533] pci 0000:03:00.0: BAR 14: no space for [mem size 0x00200000]
Feb 29 11:41:46 roxy kernel: [12062.418535] pci 0000:03:00.0: BAR 14: failed to assign [mem size 0x00200000]
Feb 29 11:41:46 roxy kernel: [12062.418537] pci 0000:03:00.0: BAR 15: no space for [mem size 0x00200000 64bit pref]
Feb 29 11:41:46 roxy kernel: [12062.418540] pci 0000:03:00.0: BAR 15: failed to assign [mem size 0x00200000 64bit pref]
Feb 29 11:41:46 roxy kernel: [12062.418542] pci 0000:03:00.0: BAR 14: no space for [mem size 0x00200000]
Feb 29 11:41:46 roxy acpid: client 1468[0:0] has disconnected
Feb 29 11:40:20 roxy wpa_supplicant[1665]: message repeated 16 times: [ wlan0: CTRL-EVENT-SCAN-STARTED ]
Feb 29 11:41:46 roxy wpa_supplicant[1665]: wlan0: CTRL-EVENT-DISCONNECTED bssid=ac:9e:17:e9:82:c4 reason=3 locally_generated=1
Feb 29 11:41:46 roxy NetworkManager[1083]: (wlan0): roamed from BSSID AC:9E:17:E9:82:C4 (skansenkronan) to (none) ((none))
Feb 29 11:41:46 roxy NetworkManager[1083]: Connection disconnected (reason -3)
Feb 29 11:41:46 roxy kernel: [12062.418544] pci 0000:03:00.0: BAR 14: failed to assign [mem size 0x00200000]
Feb 29 11:41:46 roxy kernel: [12062.418546] pci 0000:03:00.0: BAR 15: no space for [mem size 0x00200000 64bit pref]
Feb 29 11:41:46 roxy kernel: [12062.418547] pci 0000:03:00.0: BAR 15: failed to assign [mem size 0x00200000 64bit pref]
Feb 29 11:41:46 roxy kernel: [12062.418816] done.
Feb 29 11:41:46 roxy NetworkManager[1083]: (wlan0): supplicant interface state: completed -> disconnected
Feb 29 11:41:46 roxy kernel: [12062.422663] cfg80211: World regulatory domain updated:
Feb 29 11:41:46 roxy kernel: [12062.422665] cfg80211: DFS Master region: unset
Feb 29 11:41:46 roxy kernel: [12062.422666] cfg80211: (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp), (dfs_cac_time)
Feb 29 11:41:46 roxy kernel: [12062.422668] cfg80211: (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm), (N/A)
Feb 29 11:41:46 roxy kernel: [12062.422670] cfg80211: (2457000 KHz - 2482000 KHz @ 40000 KHz), (300 mBi, 2000 mBm), (N/A)
Feb 29 11:41:46 roxy kernel: [12062.422671] cfg80211: (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm), (N/A)
Feb 29 11:41:46 roxy kernel: [12062.422673] cfg80211: (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm), (N/A)
Feb 29 11:41:46 roxy kernel: [12062.422674] cfg80211: (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm), (N/A)
Feb 29 11:41:46 roxy anacron[3712]: Anacron 2.3 started on 2016-02-29
Feb 29 11:41:46 roxy anacron[3712]: Normal exit (0 jobs run)
Feb 29 11:41:46 roxy anacron[3764]: Anacron 2.3 started on 2016-02-29
Feb 29 11:41:46 roxy anacron[3764]: Normal exit (0 jobs run)
Feb 29 11:41:49 roxy kernel: [12065.448390] nvidia-modeset: WARNING: GPU:0: Lost display notification; continuing.
Feb 29 11:41:59 roxy wpa_supplicant[1665]: wlan0: CTRL-EVENT-SCAN-STARTED
Feb 29 11:41:59 roxy NetworkManager[1083]: (wlan0): supplicant interface state: disconnected -> scanning
Feb 29 11:42:00 roxy wpa_supplicant[1665]: wlan0: SME: Trying to authenticate with ac:9e:17:e9:82:c4 (SSID=‘skansenkronan’ freq=2462 MHz)
Feb 29 11:42:00 roxy kernel: [12076.550388] wlan0: authenticate with ac:9e:17:e9:82:c4
Feb 29 11:42:00 roxy NetworkManager[1083]: (wlan0): supplicant interface state: scanning -> authenticating
Feb 29 11:42:00 roxy kernel: [12076.568706] wlan0: direct probe to ac:9e:17:e9:82:c4 (try 1/3)
Feb 29 11:42:00 roxy kernel: [12076.770963] wlan0: direct probe to ac:9e:17:e9:82:c4 (try 2/3)
Feb 29 11:42:00 roxy kernel: [12076.975320] wlan0: send auth to ac:9e:17:e9:82:c4 (try 3/3)
Feb 29 11:42:01 roxy wpa_supplicant[1665]: wlan0: Trying to associate with ac:9e:17:e9:82:c4 (SSID=‘skansenkronan’ freq=2462 MHz)
Feb 29 11:42:01 roxy kernel: [12077.008823] wlan0: authenticated
Feb 29 11:42:01 roxy NetworkManager[1083]: (wlan0): supplicant interface state: authenticating -> associating
Feb 29 11:42:01 roxy kernel: [12077.011423] wlan0: associate with ac:9e:17:e9:82:c4 (try 1/3)
Feb 29 11:42:01 roxy kernel: [12077.037215] wlan0: RX AssocResp from ac:9e:17:e9:82:c4 (capab=0x1411 status=0 aid=5)
Feb 29 11:42:01 roxy wpa_supplicant[1665]: wlan0: Associated with ac:9e:17:e9:82:c4
Feb 29 11:42:01 roxy kernel: [12077.040250] wlan0: associated
Feb 29 11:42:01 roxy NetworkManager[1083]: (wlan0): supplicant interface state: associating -> associated
Feb 29 11:42:01 roxy NetworkManager[1083]: (wlan0): supplicant interface state: associated -> 4-way handshake
Feb 29 11:42:01 roxy NetworkManager[1083]: (wlan0): link timed out.
Feb 29 11:42:01 roxy NetworkManager[1083]: (wlan0): device state change: activated -> failed (reason ‘supplicant-timeout’) [100 120 11]
Feb 29 11:42:01 roxy NetworkManager[1083]: NetworkManager state is now DISCONNECTED
Feb 29 11:42:01 roxy NetworkManager[1083]: Activation (wlan0) failed for connection ‘skansenkronan’
Feb 29 11:42:01 roxy dbus[678]: [system] Activating service name=‘org.freedesktop.nm_dispatcher’ (using servicehelper)
Feb 29 11:42:01 roxy NetworkManager[1083]: (wlan0): device state change: failed -> disconnected (reason ‘none’) [120 30 0]
Feb 29 11:42:01 roxy NetworkManager[1083]: (wlan0): deactivating device (reason ‘none’) [0]
Feb 29 11:42:01 roxy dbus[678]: [system] Successfully activated service ‘org.freedesktop.nm_dispatcher’
Feb 29 11:42:03 roxy wpa_supplicant[1665]: wlan0: WPA: Key negotiation completed with ac:9e:17:e9:82:c4 [PTK=CCMP GTK=CCMP]
Feb 29 11:42:03 roxy wpa_supplicant[1665]: wlan0: CTRL-EVENT-CONNECTED - Connection to ac:9e:17:e9:82:c4 completed [id=0 id_str=]
Feb 29 11:42:04 roxy NetworkManager[1083]: (wlan0): DHCP client pid 1876 didn’t exit, will kill it.
Feb 29 11:42:05 roxy acpid: client connected from 1468[0:0]
Feb 29 11:42:05 roxy acpid: 1 client rule loaded
Feb 29 11:42:06 roxy rtkit-daemon[2650]: Successfully made thread 3865 of process 2646 (n/a) owned by ‘1000’ RT at priority 5.
Feb 29 11:42:06 roxy rtkit-daemon[2650]: Supervising 6 threads of 1 processes of 1 users.
Feb 29 11:42:07 roxy NetworkManager[1083]: (wlan0): canceled DHCP transaction, DHCP client pid 1876
Feb 29 11:42:07 roxy kernel: [12083.483522] wlan0: deauthenticating from ac:9e:17:e9:82:c4 by local choice (Reason: 3=DEAUTH_LEAVING)
Feb 29 11:42:13 roxy wpa_supplicant[1665]: wlan0: CTRL-EVENT-DISCONNECTED bssid=ac:9e:17:e9:82:c4 reason=3 locally_generated=1
Feb 29 11:42:13 roxy avahi-daemon[806]: Withdrawing address record for 192.168.1.29 on wlan0.
Feb 29 11:42:13 roxy avahi-daemon[806]: Leaving mDNS multicast group on interface wlan0.IPv4 with address 192.168.1.29.
Feb 29 11:42:13 roxy NetworkManager[1083]: DNS: plugin dnsmasq update failed
Feb 29 11:42:13 roxy NetworkManager[1083]: Removing DNS information from /sbin/resolvconf
Feb 29 11:42:13 roxy dnsmasq[1896]: setting upstream servers from DBus
Feb 29 11:42:13 roxy avahi-daemon[806]: Interface wlan0.IPv4 no longer relevant for mDNS.
Feb 29 11:42:13 roxy kernel: [12089.503206] cfg80211: World regulatory domain updated:
Feb 29 11:42:13 roxy kernel: [12089.503208] cfg80211: DFS Master region: unset
Feb 29 11:42:13 roxy kernel: [12089.503208] cfg80211: (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp), (dfs_cac_time)
Feb 29 11:42:13 roxy kernel: [12089.503210] cfg80211: (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm), (N/A)
Feb 29 11:42:13 roxy kernel: [12089.503211] cfg80211: (2457000 KHz - 2482000 KHz @ 40000 KHz), (300 mBi, 2000 mBm), (N/A)
Feb 29 11:42:13 roxy kernel: [12089.503211] cfg80211: (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm), (N/A)
Feb 29 11:42:13 roxy kernel: [12089.503212] cfg80211: (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm), (N/A)
Feb 29 11:42:13 roxy kernel: [12089.503213] cfg80211: (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm), (N/A)
Feb 29 11:42:13 roxy NetworkManager[1083]: Auto-activating connection ‘skansenkronan’.
Feb 29 11:42:13 roxy NetworkManager[1083]: Activation (wlan0) starting connection ‘skansenkronan’
Feb 29 11:42:13 roxy NetworkManager[1083]: (wlan0): device state change: disconnected -> prepare (reason ‘none’) [30 40 0]
Feb 29 11:42:13 roxy NetworkManager[1083]: NetworkManager state is now CONNECTING
Feb 29 11:42:13 roxy NetworkManager[1083]: Activation (wlan0) Stage 1 of 5 (Device Prepare) scheduled…
Feb 29 11:42:13 roxy NetworkManager[1083]: (wlan0): supplicant interface state: 4-way handshake -> completed
Feb 29 11:42:13 roxy wpa_supplicant[1665]: wlan0: CTRL-EVENT-SCAN-STARTED
Feb 29 11:42:13 roxy NetworkManager[1083]: Connection disconnected (reason -3)
Feb 29 11:42:13 roxy NetworkManager[1083]: (wlan0): supplicant interface state: completed -> disconnected
Feb 29 11:42:13 roxy NetworkManager[1083]: Connection disconnected (reason -3)
Feb 29 11:42:13 roxy NetworkManager[1083]: Activation (wlan0) Stage 1 of 5 (Device Prepare) started…
Feb 29 11:42:13 roxy NetworkManager[1083]: Activation (wlan0) Stage 2 of 5 (Device Configure) scheduled…
Feb 29 11:42:13 roxy NetworkManager[1083]: Activation (wlan0) Stage 1 of 5 (Device Prepare) complete.
Feb 29 11:42:13 roxy NetworkManager[1083]: Activation (wlan0) Stage 2 of 5 (Device Configure) starting…
Feb 29 11:42:13 roxy NetworkManager[1083]: (wlan0): device state change: prepare -> config (reason ‘none’) [40 50 0]
Feb 29 11:42:13 roxy NetworkManager[1083]: Activation (wlan0/wireless): access point ‘skansenkronan’ has security, but secrets are required.
Feb 29 11:42:13 roxy NetworkManager[1083]: (wlan0): device state change: config -> need-auth (reason ‘none’) [50 60 0]
Feb 29 11:42:13 roxy NetworkManager[1083]: Activation (wlan0) Stage 2 of 5 (Device Configure) complete.
Feb 29 11:42:13 roxy NetworkManager[1083]: Activation (wlan0) Stage 1 of 5 (Device Prepare) scheduled…
Feb 29 11:42:13 roxy NetworkManager[1083]: Activation (wlan0) Stage 1 of 5 (Device Prepare) started…
Feb 29 11:42:13 roxy NetworkManager[1083]: (wlan0): device state change: need-auth -> prepare (reason ‘none’) [60 40 0]
Feb 29 11:42:13 roxy NetworkManager[1083]: Activation (wlan0) Stage 2 of 5 (Device Configure) scheduled…
Feb 29 11:42:13 roxy NetworkManager[1083]: Activation (wlan0) Stage 1 of 5 (Device Prepare) complete.
Feb 29 11:42:13 roxy NetworkManager[1083]: Activation (wlan0) Stage 2 of 5 (Device Configure) starting…
Feb 29 11:42:13 roxy NetworkManager[1083]: (wlan0): device state change: prepare -> config (reason ‘none’) [40 50 0]
Feb 29 11:42:13 roxy NetworkManager[1083]: Activation (wlan0/wireless): connection ‘skansenkronan’ has security, and secrets exist. No new secrets needed.
Feb 29 11:42:13 roxy NetworkManager[1083]: Config: added ‘ssid’ value ‘skansenkronan’
Feb 29 11:42:13 roxy NetworkManager[1083]: Config: added ‘scan_ssid’ value ‘1’
Feb 29 11:42:13 roxy NetworkManager[1083]: Config: added ‘key_mgmt’ value ‘WPA-PSK’
Feb 29 11:42:13 roxy NetworkManager[1083]: Config: added ‘auth_alg’ value ‘OPEN’
Feb 29 11:42:13 roxy NetworkManager[1083]: Config: added ‘psk’ value ‘’
Feb 29 11:42:13 roxy NetworkManager[1083]: Activation (wlan0) Stage 2 of 5 (Device Configure) complete.
Feb 29 11:42:13 roxy NetworkManager[1083]: Config: set interface ap_scan to 1
Feb 29 11:42:14 roxy wpa_supplicant[1665]: wlan0: SME: Trying to authenticate with ac:9e:17:e9:82:c4 (SSID=‘skansenkronan’ freq=2462 MHz)
Feb 29 11:42:14 roxy kernel: [12090.574755] wlan0: authenticate with ac:9e:17:e9:82:c4
Feb 29 11:42:14 roxy NetworkManager[1083]: (wlan0): supplicant interface state: disconnected -> authenticating
Feb 29 11:42:14 roxy kernel: [12090.593341] wlan0: send auth to ac:9e:17:e9:82:c4 (try 1/3)
Feb 29 11:42:14 roxy wpa_supplicant[1665]: wlan0: Trying to associate with ac:9e:17:e9:82:c4 (SSID=‘skansenkronan’ freq=2462 MHz)
Feb 29 11:42:14 roxy kernel: [12090.598453] wlan0: authenticated
Feb 29 11:42:14 roxy kernel: [12090.599224] wlan0: associate with ac:9e:17:e9:82:c4 (try 1/3)
Feb 29 11:42:14 roxy NetworkManager[1083]: (wlan0): supplicant interface state: authenticating -> associating
Feb 29 11:42:14 roxy kernel: [12090.616707] wlan0: RX AssocResp from ac:9e:17:e9:82:c4 (capab=0x1411 status=0 aid=5)
Feb 29 11:42:14 roxy wpa_supplicant[1665]: wlan0: Associated with ac:9e:17:e9:82:c4
Feb 29 11:42:14 roxy kernel: [12090.619577] wlan0: associated
Feb 29 11:42:14 roxy NetworkManager[1083]: (wlan0): supplicant interface state: associating -> associated
Feb 29 11:42:14 roxy NetworkManager[1083]: (wlan0): supplicant interface state: associated -> 4-way handshake
Feb 29 11:42:14 roxy wpa_supplicant[1665]: wlan0: WPA: Key negotiation completed with ac:9e:17:e9:82:c4 [PTK=CCMP GTK=CCMP]
Feb 29 11:42:14 roxy wpa_supplicant[1665]: wlan0: CTRL-EVENT-CONNECTED - Connection to ac:9e:17:e9:82:c4 completed [id=0 id_str=]
Feb 29 11:42:14 roxy NetworkManager[1083]: (wlan0): supplicant interface state: 4-way handshake -> completed
Feb 29 11:42:14 roxy NetworkManager[1083]: Activation (wlan0/wireless) Stage 2 of 5 (Device Configure) successful. Connected to wireless network ‘skansenkronan’.
Feb 29 11:42:14 roxy NetworkManager[1083]: Activation (wlan0) Stage 3 of 5 (IP Configure Start) scheduled.
Feb 29 11:42:14 roxy NetworkManager[1083]: Activation (wlan0) Stage 3 of 5 (IP Configure Start) started…
Feb 29 11:42:14 roxy NetworkManager[1083]: (wlan0): device state change: config -> ip-config (reason ‘none’) [50 70 0]
Feb 29 11:42:14 roxy NetworkManager[1083]: Activation (wlan0) Beginning DHCPv4 transaction (timeout in 45 seconds)
Feb 29 11:42:14 roxy NetworkManager[1083]: dhclient started with pid 3893
Feb 29 11:42:14 roxy NetworkManager[1083]: Activation (wlan0) Beginning IP6 addrconf.
Feb 29 11:42:14 roxy avahi-daemon[806]: Withdrawing address record for fe80::7edd:90ff:fe81:32b8 on wlan0.
Feb 29 11:42:14 roxy avahi-daemon[806]: Leaving mDNS multicast group on interface wlan0.IPv6 with address fe80::7edd:90ff:fe81:32b8.
Feb 29 11:42:14 roxy avahi-daemon[806]: Interface wlan0.IPv6 no longer relevant for mDNS.
Feb 29 11:42:14 roxy NetworkManager[1083]: Activation (wlan0) Stage 3 of 5 (IP Configure Start) complete.
Feb 29 11:42:14 roxy dhclient: Internet Systems Consortium DHCP Client 4.2.4
Feb 29 11:42:14 roxy dhclient: Copyright 2004-2012 Internet Systems Consortium.
Feb 29 11:42:14 roxy dhclient: All rights reserved.
Feb 29 11:42:14 roxy dhclient: For info, please visit https://www.isc.org/software/dhcp/
Feb 29 11:42:14 roxy dhclient:
Feb 29 11:42:14 roxy NetworkManager[1083]: (wlan0): DHCPv4 state changed nbi -> preinit
Feb 29 11:42:16 roxy avahi-daemon[806]: Joining mDNS multicast group on interface wlan0.IPv6 with address fe80::7edd:90ff:fe81:32b8.
Feb 29 11:42:16 roxy avahi-daemon[806]: New relevant interface wlan0.IPv6 for mDNS.
Feb 29 11:42:16 roxy avahi-daemon[806]: Registering new address record for fe80::7edd:90ff:fe81:32b8 on wlan0.*.
Feb 29 11:42:21 roxy dhclient: Listening on LPF/wlan0/7c:dd:90:81:32:b8
Feb 29 11:42:21 roxy dhclient: Sending on LPF/wlan0/7c:dd:90:81:32:b8
Feb 29 11:42:21 roxy dhclient: Sending on Socket/fallback
Feb 29 11:42:21 roxy dhclient: DHCPREQUEST of 192.168.1.29 on wlan0 to 255.255.255.255 port 67 (xid=0x6e87c4c3)
Feb 29 11:42:21 roxy dhclient: DHCPACK of 192.168.1.29 from 192.168.1.1
Feb 29 11:42:21 roxy dhclient: bound to 192.168.1.29 – renewal in 39411 seconds.
Feb 29 11:42:21 roxy NetworkManager[1083]: (wlan0): DHCPv4 state changed preinit -> reboot
Feb 29 11:42:21 roxy NetworkManager[1083]: address 192.168.1.29
Feb 29 11:42:21 roxy NetworkManager[1083]: prefix 24 (255.255.255.0)
Feb 29 11:42:21 roxy NetworkManager[1083]: gateway 192.168.1.1
Feb 29 11:42:21 roxy NetworkManager[1083]: hostname ‘roxy’
Feb 29 11:42:21 roxy NetworkManager[1083]: nameserver ‘192.168.1.1’
Feb 29 11:42:21 roxy NetworkManager[1083]: Activation (wlan0) Stage 5 of 5 (IPv4 Configure Commit) scheduled…
Feb 29 11:42:21 roxy NetworkManager[1083]: Activation (wlan0) Stage 5 of 5 (IPv4 Commit) started…
Feb 29 11:42:21 roxy avahi-daemon[806]: Joining mDNS multicast group on interface wlan0.IPv4 with address 192.168.1.29.
Feb 29 11:42:21 roxy avahi-daemon[806]: New relevant interface wlan0.IPv4 for mDNS.
Feb 29 11:42:21 roxy avahi-daemon[806]: Registering new address record for 192.168.1.29 on wlan0.IPv4.
Feb 29 11:42:22 roxy NetworkManager[1083]: (wlan0): device state change: ip-config -> secondaries (reason ‘none’) [70 90 0]
Feb 29 11:42:22 roxy NetworkManager[1083]: Activation (wlan0) Stage 5 of 5 (IPv4 Commit) complete.
Feb 29 11:42:22 roxy NetworkManager[1083]: (wlan0): device state change: secondaries -> activated (reason ‘none’) [90 100 0]
Feb 29 11:42:22 roxy NetworkManager[1083]: NetworkManager state is now CONNECTED_GLOBAL
Feb 29 11:42:22 roxy NetworkManager[1083]: Policy set ‘skansenkronan’ (wlan0) as default for IPv4 routing and DNS.
Feb 29 11:42:22 roxy NetworkManager[1083]: Writing DNS information to /sbin/resolvconf
Feb 29 11:42:22 roxy dnsmasq[1896]: setting upstream servers from DBus
Feb 29 11:42:22 roxy dnsmasq[1896]: using nameserver 192.168.1.1#53
Feb 29 11:42:22 roxy NetworkManager[1083]: Activation (wlan0) successful, device activated.
Feb 29 11:42:22 roxy dbus[678]: [system] Activating service name=‘org.freedesktop.nm_dispatcher’ (using servicehelper)
Feb 29 11:42:22 roxy dbus[678]: [system] Successfully activated service ‘org.freedesktop.nm_dispatcher’
Feb 29 11:42:25 roxy kernel: [12101.526289] nvidia-modeset: WARNING: GPU:0: Unable to read EDID for display device Samsung S24D390 (HDMI-0)
Feb 29 11:42:29 roxy ntpdate[3974]: step time server 91.189.94.4 offset 0.789946 sec
Feb 29 11:42:36 roxy NetworkManager[1083]: (wlan0): IP6 addrconf timed out or failed.
Feb 29 11:42:36 roxy NetworkManager[1083]: Activation (wlan0) Stage 4 of 5 (IPv6 Configure Timeout) scheduled…
Feb 29 11:42:36 roxy NetworkManager[1083]: Activation (wlan0) Stage 4 of 5 (IPv6 Configure Timeout) started…
Feb 29 11:42:36 roxy NetworkManager[1083]: Activation (wlan0) Stage 4 of 5 (IPv6 Configure Timeout) complete.
Feb 29 11:42:41 roxy wpa_supplicant[1665]: wlan0: CTRL-EVENT-SCAN-STARTED
Feb 29 11:45:53 roxy kernel: [12309.490098] nvidia-modeset: WARNING: GPU:0: Unable to read EDID for display device Samsung S24D390 (HDMI-0)
Feb 29 11:46:53 roxy kernel: [12369.644043] nvidia-modeset: ERROR: GPU:0: Idling EVO timed out: 0x0000957d:0:0:0x00000040
Feb 29 11:46:57 roxy kernel: [12373.651104] nvidia-modeset: ERROR: GPU:0: Idling EVO timed out: 0x0000917e:0:0:0x00000001
Feb 29 11:47:01 roxy kernel: [12377.658218] nvidia-modeset: ERROR: GPU:0: Idling EVO timed out: 0x0000927c:0:0:0x00000001
Feb 29 11:47:05 roxy kernel: [12381.665353] nvidia-modeset: ERROR: GPU:0: Idling EVO timed out: 0x0000917e:1:0:0x00000001
Feb 29 11:47:09 roxy kernel: [12385.672540] nvidia-modeset: ERROR: GPU:0: Idling EVO timed out: 0x0000927c:1:0:0x00000001
Feb 29 11:47:13 roxy kernel: [12389.679708] nvidia-modeset: ERROR: GPU:0: Idling EVO timed out: 0x0000917e:2:0:0x00000001
Feb 29 11:47:17 roxy kernel: [12393.686848] nvidia-modeset: ERROR: GPU:0: Idling EVO timed out: 0x0000927c:2:0:0x00000001
Feb 29 11:47:21 roxy kernel: [12397.693955] nvidia-modeset: ERROR: GPU:0: Idling EVO timed out: 0x0000917e:3:0:0x00000001
Feb 29 11:47:25 roxy kernel: [12401.701095] nvidia-modeset: ERROR: GPU:0: Idling EVO timed out: 0x0000927c:3:0:0x00000001
Feb 29 11:47:29 roxy kernel: [12405.751840] nvidia-modeset: ERROR: GPU:0: Idling EVO timed out: 0x0000957d:0:0:0x00000040

Crash log from 352 driver.
I have not been able to get one with the 361 driver as the system is just not responsive enough after a crash.
nvidia-bug-report.log.gz (115 KB)

I’m not allowing the system to suspend anymore and have no problems.
I would like to be able to use suspend as the system is now pulling 50 Watts in idle around the clock.

Still a problem with kernel 4.4.0-36-generic and nvidia 370.23.

I get the same/a similar problem on my Arch Linux since swapping from a GeForce GTX 750 to a GeForce GTX 970 (same driver/kernel/os as before). Sometimes the system does not wakeup from suspend (the monitor gets waked up but stays black and the system gets completely unresponsive)

System specs:

OS: Arch Linux
Kernel: 4.7.2-1-ARCH
NVIDIA driver: 370.23-4
NVIDIA card: 01:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 970] (rev a1)
Monitor: Hitachi/HINT W240D (connected via DVI)

Because the system gets unresponsive after resuming from suspend, I could not get a log from nvidia-bug-report.log.gz after the problem occured.
Here are a few lines from journalctl shortly before the problem occurs (you find the whole log from today in the attachment as well as a nvidia-bug-report.log.gz after rebooting the system):

Sep 03 15:34:57 Antiphon avahi-daemon[352]: New relevant interface eno1.IPv6 for mDNS.
Sep 03 15:34:57 Antiphon avahi-daemon[352]: Registering new address record for fe80::6031:f025:6f21:b469 on eno1.*.
Sep 03 15:34:58 Antiphon ntpd[378]: Listen normally on 12 eno1 192.168.1.102:123
Sep 03 15:34:58 Antiphon ntpd[378]: Listen normally on 13 eno1 [fe80::6031:f025:6f21:b469%2]:123
Sep 03 15:34:58 Antiphon ntpd[378]: new interface(s) found: waking up resolver
Sep 03 15:34:59 Antiphon kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Sep 03 15:34:59 Antiphon kernel: ata4.00: configured for UDMA/133
Sep 03 15:35:11 Antiphon acpid[346]: client connected from 379[0:0]
Sep 03 15:35:11 Antiphon acpid[346]: 1 client rule loaded
Sep 03 15:35:12 Antiphon root[3948]: ACPI group/action undefined: jack/lineout / LINEOUT
Sep 03 15:35:12 Antiphon root[3950]: ACPI group/action undefined: jack/videoout / VIDEOOUT
Sep 03 15:35:31 Antiphon root[3952]: ACPI group/action undefined: jack/lineout / LINEOUT
Sep 03 15:35:31 Antiphon root[3954]: ACPI group/action undefined: jack/videoout / VIDEOOUT
Sep 03 15:36:11 Antiphon kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000957d:0:0:0x00000040
Sep 03 15:36:15 Antiphon kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000917e:0:0:0x00000001
Sep 03 15:36:19 Antiphon kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000927c:0:0:0x00000001
Sep 03 15:36:23 Antiphon kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000917e:1:0:0x00000001
Sep 03 15:36:27 Antiphon kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000927c:1:0:0x00000001
Sep 03 15:36:31 Antiphon kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000917e:2:0:0x00000001
Sep 03 15:36:35 Antiphon kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000927c:2:0:0x00000001
Sep 03 15:36:39 Antiphon kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000917e:3:0:0x00000001
Sep 03 15:36:43 Antiphon kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000927c:3:0:0x00000001
-- Reboot --

As I did not find a way to add an attachment, I’ve put the files on an external hoster:
journalctl.log from the whole day http://pastebin.com/2R4JMxrc
nvidia-bug-report.log.gz http://s000.tinyupload.com/index.php?file_id=08086779656445978052

Latest test with kernel 4.4 and driver 370.28
Display is black in resume from suspend. System is so responsive that it allows for a remote login with ssh so I can get to diagnostics.

Xorg is hanging with 100% cpu.

lsmod shows that nouveau is NOT loaded.

Excerpts from the logs that can be interesting are:
[ 138.404188] nvidia-modeset: WARNING: GPU:0: Lost display notification; continuing.
[ 140.406359] e1000e: enp0s31f6 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[ 140.406420] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s31f6: link becomes ready
[ 181.619021] nvidia-modeset: WARNING: GPU:0: Unable to read EDID for display device Samsung S24D390 (HDMI-0)
[ 389.665921] nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000957d:0:0:0x00000040

I got a bugreport from nvidia-bug-report.sh
nvidia-bug-report.log.gz (56.9 KB)

I have the same issue as mlaggner on my desktop. At some point it seemed like some driver versions fixed the issue, but it came back in the more recent ones (also upgraded from a 560ti -> 970). Next time it happens I can try to SSH in, too.

I tried hibernation to get more information for troubleshooting.

Trying to hibernate just crashes, and I can’t even get a log of it happening.

Have you tried removing all other expansion cards to determine if a card conflict is at play?

Back when I was using an Asus GTX750-DCSL-2GD5, a StarTech PEX2IDE and a StarTech PCI300WN2X2 with an Asus SABERTOOTH 990FX R2.0 (UEFI 2501 at the time), resume from suspend worked fine.

But when I upgraded to an Asus STRIX-GTX960-DC2OC-4GD5 it no longer functioned–that is until I temporarily removed the superfluous StarTech cards, found that resume from suspend worked fine with the '960 after all and then reintroduced the other cards one-at-a-time until I identified the PCI300WN2X2 as the culprit. Swapping out that PCI card for the PCIe-based StarTech PEX300WN2X2 resolved the wake from suspend issue.

Then about a couple of months ago Asus released UEFI 2901 for the Sabertooth (the first UEFI update that 'board has had in over two years). So I applied it and then re-tested the PCI300WN2X2 with the STRIX-GTX960-DC2OC-4GD5 only to discover that wake from suspend now functions correctly with that combo.

The take-away:

Expansion card conflicts and UEFI updates can have an influence on wake from suspend functionality.

Thank you for your input. I appreciate all advice.

I have tried every BIOS update that has been released for the card and has not seen any effect on the symptoms from this.

I did now remove all expansion cards as suggested.

The system sometimes resumes, sometimes not on a test to immediately resume after a successful standby.
I’m not sure about the ratio of success to resume but it seems like around 25%.

Is there anything more that can be disabled, maybe in BIOS to simplify the situation?

What really bothers me about this is that I paid for one of Nvidias premium products and during almost a year when I’ve been asking for help here, there has not been even been a hello from Nvidias representatives or any advice about what to do to further diagnose the problem. There’s just complete silence from Nvidia.

Just sad.

$ lspci
00:00.0 Host bridge: Intel Corporation Sky Lake Host Bridge/DRAM Registers (rev 07)
00:01.0 PCI bridge: Intel Corporation Sky Lake PCIe Controller (x16) (rev 07)
00:14.0 USB controller: Intel Corporation Sunrise Point-H USB 3.0 xHCI Controller (rev 31)
00:16.0 Communication controller: Intel Corporation Sunrise Point-H CSME HECI #1 (rev 31)
00:17.0 SATA controller: Intel Corporation Sunrise Point-H SATA controller [AHCI mode] (rev 31)
00:1b.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Root Port #17 (rev f1)
00:1c.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #1 (rev f1)
00:1d.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #9 (rev f1)
00:1f.0 ISA bridge: Intel Corporation Sunrise Point-H LPC Controller (rev 31)
00:1f.2 Memory controller: Intel Corporation Sunrise Point-H PMC (rev 31)
00:1f.3 Audio device: Intel Corporation Sunrise Point-H HD Audio (rev 31)
00:1f.4 SMBus: Intel Corporation Sunrise Point-H SMBus (rev 31)
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-V (rev 31)
01:00.0 VGA compatible controller: NVIDIA Corporation GM200 [GeForce GTX 980 Ti] (rev a1)
01:00.1 Audio device: NVIDIA Corporation Device 0fb0 (rev a1)
04:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller (rev 01)

  1. Is the UEFI / BIOS on your PC’s motherboard up-to-date?

“Version 2002, 2016/09/29”

Z170M-PLUS | Motherboards | ASUS Global
https://www.asus.com/Motherboards/Z170M-PLUS/

E10960_BIOS_Update_en.pdf
http://dlcdnet.asus.com/pub/ASUS/mb/qsguide/E10960_BIOS_Update_en.pdf?_ga=1.98257503.1587441999.1477231416

BTW. Before flashing a motherboard’s UEFI / BIOS, first read the relevant section in the 'board’s users manual then boot into the UEFI / BIOS and load ‘Setup Defaults’ (or however it’s worded) and then reboot before shutting the PC down.

  1. Check out the link in my signature and scroll down to ‘Clearing the CMOS’.

  2. If the above approaches fail to remedy the resume from suspend issue, then if possible try the GTX 980 Ti in the Z170M-PLUS’ lower 16 lane PCIe slot (after once again clearing the CMOS).

  3. What’s the make and model # of your PC’s power supply?

Yes. I am running BIOS version 2002, which is the latest listed for this Z170M-PLUS motherboard.

On your suggestion, I did follow the manuals instructions for manually resetting the CMOS by shortening the pins for this and was later greeted by a setup BIOS wizard as the CMOS had been successfully cleared.

No. I’ve tried mounting the card in the lower PCIex16 slot but the card physically conflicts with several of the connectors that are mounted on that edge of the motherboard, which is sad as I wanted this flexibility in my system when I planned it. Apparently EVGA, Nvidia or Asus did not fully agree on specs and usability of the standards here.

[/quote]

The system is powered by a Corsair RM650i PSU.

i have an ASUS ROG G751 laptop, for the last few months the nvidia driver crashes on resume. I run Arch Linux and update weekly. i can normally get about 5-7 successful resumes before a crash. it doesn’t matter whether i’m on battery or A/C. it doesn’t matter if i’m on a text console or in X. my BIOS is current.

wifi will break but if i have an ethernet cable plugged in and am fast, i can get a dmesg before the machine hangs. the tail of it is similar to previously posted logs for the GPU idle errors.

00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor DRAM Controller (rev 06)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller (rev 06)
00:14.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI (rev 05)
00:16.0 Communication controller: Intel Corporation 8 Series/C220 Series Chipset Family MEI Controller #1 (rev 04)
00:1a.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2 (rev 05)
00:1b.0 Audio device: Intel Corporation 8 Series/C220 Series Chipset High Definition Audio Controller (rev 05)
00:1c.0 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #1 (rev d5)
00:1c.2 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #3 (rev d5)
00:1c.3 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #4 (rev d5)
00:1d.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #1 (rev 05)
00:1f.0 ISA bridge: Intel Corporation HM87 Express LPC Controller (rev 05)
00:1f.2 SATA controller: Intel Corporation 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode] (rev 05)
00:1f.3 SMBus: Intel Corporation 8 Series/C220 Series Chipset Family SMBus Controller (rev 05)
01:00.0 VGA compatible controller: NVIDIA Corporation GM204M [GeForce GTX 980M] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GM204 High Definition Audio Controller (rev a1)
3b:00.0 Network controller: Intel Corporation Wireless 7260 (rev bb)
3c:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 10)

01:00.0 VGA compatible controller: NVIDIA Corporation GM204M [GeForce GTX 980M] (rev a1) (prog-if 00 [VGA controller])
Subsystem: ASUSTeK Computer Inc. Device 22da
Flags: bus master, fast devsel, latency 0, IRQ 32
Memory at ec000000 (32-bit, non-prefetchable)
Memory at c0000000 (64-bit, prefetchable)
Memory at d0000000 (64-bit, prefetchable)
I/O ports at e000
[virtual] Expansion ROM at 000c0000 [disabled]
Capabilities:
Kernel driver in use: nvidia
Kernel modules: nouveau, nvidia_drm, nvidia

Right then. If none of the remedial suggestions I’ve made have yielded a consistently functioning resume from suspend then we are now officially grasping at straws (unless someone else has a further insight into this issue).

According to the following article your RM650i was made by CWT, a respected OEM so there should be no problem with its quality.

July 20, 2016
Corsair RM650x PSU Review - Tom’s Hardware
http://www.tomshardware.com/reviews/corsair-rm650x-psu,4611.html

:: Channel Well Technology Co.,Ltd. ::
http://www.cwt.com.tw/

I’m assuming that you researched the Wattage of the power supply you would require to satisfy your current PC’s configuration plus a little extra to accommodate any reasonable expansion? If not:

EVGA - Power Meter
http://www.evga.com/power-meter/

From the product page for Version 352.55 (the oldest nVidia driver that supports the GTX 980 Ti)

[i]"Known Issues with this release:

  • Resuming from suspend may not be reliable on GeForce GTX 9xx boards in some configurations."[/i]

Drivers | GeForce
http://www.geforce.com/drivers/results/92826

It seems you have one such configuration and that to save on idle power consumption you’ll have to shut your machine down or if possible schedule it to do so after a user defined period of inactivity.

BTW. Installing mate-themes via the Synaptic Package Manager will yield an attractive charcoal theme called BlackMATE and installing grml-rescueboot will allow you to loop-mount Linux Mint .iso images so that installing a fresh OS can occur at HDD or SSD speeds.

Grub2/ISOBoot - Community Ubuntu Documentation
https://help.ubuntu.com/community/Grub2/ISOBoot

One more point to consider:

Though it employs an nVidia GPU, your EVGA 980 ti hybrid is still an EVGA product sporting EVGA’s custom PCB, VRM and firmware / BIOS all of which differ from a card specifically manufactured by nVidia or any of the other nVidia-based graphics card manufacturers. Perhaps starting a help thread on the EVGA Forums may draw in some further user insight into resolving the resume from suspend issue:

EVGA GeForce 900/TITAN X Series - EVGA Forums
http://forums.evga.com/EVGA-GeForce-900TITAN-X-Series-f99.aspx

To save from going over covered ground you could quote, copy and paste selected portions of this thread to more quickly bring EVGA forum members up-to-speed re which steps have already been taken.

FWIW

I suspect that how a *motherboard’s UEFI / BIOS’ ‘Power’ and ‘DDR power down mode’ and ‘S3 Video Repost’ sections (or however they’re worded) are adjusted may influence the effectiveness of the following info:

pm-suspend(8): Suspend/Hibernate your computer - Linux man page
https://linux.die.net/man/8/pm-suspend

Power management/Suspend and hibernate - ArchWiki
https://wiki.archlinux.org/index.php/Power_management/Suspend_and_hibernate

UnderstandingSuspend - Ubuntu Wiki
https://wiki.ubuntu.com/UnderstandingSuspend

(Power Management S3 Tricks and Tips)
Kernel/Reference/S3 - Ubuntu Wiki
https://wiki.ubuntu.com/Kernel/Reference/S3

*EDIT

Some clues from your motherboard’s .pdf manual:

Page 66:

[i]- ‘Native ASPM [Disabled]’

  • ‘DMI Link ASPM Control [Disabled]’
  • ‘ASPM Support [Disabled]’[/i]

Page 67:

[i]- ‘DMI Link ASPM Control [Disabled]’

  • ‘PEG ASPM [Disabled]’[/i]

Page 71:

[i]- ‘ErP Ready [Disabled]’

  • ‘Deep S4 [Disabled]’[/i]

E10768__Z170M-PLUS_UM_WEB.pdf
http://dlcdnet.asus.com/pub/ASUS/mb/LGA1151/Z170M-PLUS/E10768__Z170M-PLUS_UM_WEB.pdf?_ga=1.262171084.1587441999.1477231416

Z170M-PLUS | Motherboards | ASUS Global
https://www.asus.com/Motherboards/Z170M-PLUS/

Many thanks for all the hints and tips. This turned a bit complicated to thoroughly go through all the options so I’ll turn this into a weekend project.
It’s my production system so I don’t want to leave it in a non-working state.

I’ve also got another hardware platform made available and will install the card there just to see if the behavior is similar.

A production machine? Do you have a backup graphics card in case ESD or another unforeseen disaster strikes? Murphy’s law being what it is.