i have an ASUS ROG G751 laptop, for the last few months the nvidia driver crashes on resume. I run Arch Linux and update weekly. i can normally get about 5-7 successful resumes before a crash. it doesn’t matter whether i’m on battery or A/C. it doesn’t matter if i’m on a text console or in X. my BIOS is current.
wifi will break but if i have an ethernet cable plugged in and am fast, i can get a dmesg before the machine hangs. the tail of it is similar to previously posted logs for the GPU idle errors.
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor DRAM Controller (rev 06)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller (rev 06)
00:14.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI (rev 05)
00:16.0 Communication controller: Intel Corporation 8 Series/C220 Series Chipset Family MEI Controller #1 (rev 04)
00:1a.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2 (rev 05)
00:1b.0 Audio device: Intel Corporation 8 Series/C220 Series Chipset High Definition Audio Controller (rev 05)
00:1c.0 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #1 (rev d5)
00:1c.2 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #3 (rev d5)
00:1c.3 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #4 (rev d5)
00:1d.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #1 (rev 05)
00:1f.0 ISA bridge: Intel Corporation HM87 Express LPC Controller (rev 05)
00:1f.2 SATA controller: Intel Corporation 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode] (rev 05)
00:1f.3 SMBus: Intel Corporation 8 Series/C220 Series Chipset Family SMBus Controller (rev 05)
01:00.0 VGA compatible controller: NVIDIA Corporation GM204M [GeForce GTX 980M] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GM204 High Definition Audio Controller (rev a1)
3b:00.0 Network controller: Intel Corporation Wireless 7260 (rev bb)
3c:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 10)
01:00.0 VGA compatible controller: NVIDIA Corporation GM204M [GeForce GTX 980M] (rev a1) (prog-if 00 [VGA controller])
Subsystem: ASUSTeK Computer Inc. Device 22da
Flags: bus master, fast devsel, latency 0, IRQ 32
Memory at ec000000 (32-bit, non-prefetchable)
Memory at c0000000 (64-bit, prefetchable)
Memory at d0000000 (64-bit, prefetchable)
I/O ports at e000
[virtual] Expansion ROM at 000c0000 [disabled]
Capabilities:
Kernel driver in use: nvidia
Kernel modules: nouveau, nvidia_drm, nvidia
Right then. If none of the remedial suggestions I’ve made have yielded a consistently functioning resume from suspend then we are now officially grasping at straws (unless someone else has a further insight into this issue).
According to the following article your RM650i was made by CWT, a respected OEM so there should be no problem with its quality.
I’m assuming that you researched the Wattage of the power supply you would require to satisfy your current PC’s configuration plus a little extra to accommodate any reasonable expansion? If not:
It seems you have one such configuration and that to save on idle power consumption you’ll have to shut your machine down or if possible schedule it to do so after a user defined period of inactivity.
–
BTW. Installing mate-themes via the Synaptic Package Manager will yield an attractive charcoal theme called BlackMATE and installing grml-rescueboot will allow you to loop-mount Linux Mint .iso images so that installing a fresh OS can occur at HDD or SSD speeds.
Though it employs an nVidia GPU, your EVGA 980 ti hybrid is still an EVGA product sporting EVGA’s custom PCB, VRM and firmware / BIOS all of which differ from a card specifically manufactured by nVidia or any of the other nVidia-based graphics card manufacturers. Perhaps starting a help thread on the EVGA Forums may draw in some further user insight into resolving the resume from suspend issue:
To save from going over covered ground you could quote, copy and paste selected portions of this thread to more quickly bring EVGA forum members up-to-speed re which steps have already been taken.
I suspect that how a *motherboard’s UEFI / BIOS’ ‘Power’ and ‘DDR power down mode’ and ‘S3 Video Repost’ sections (or however they’re worded) are adjusted may influence the effectiveness of the following info:
Many thanks for all the hints and tips. This turned a bit complicated to thoroughly go through all the options so I’ll turn this into a weekend project.
It’s my production system so I don’t want to leave it in a non-working state.
I’ve also got another hardware platform made available and will install the card there just to see if the behavior is similar.
I have not seen that. Thanks for the tip, I’ll try.
The machine has on board graphics if the nvidia card should fail.
What makes this difficult for me to troubleshoot is the erratic nature of the failures. It doesn’t alway fail to resume. Sometimes it does, sometimes it don’t. I have not been able to diagnose any pattern for when it fails.
Nvidia Driver Version: 375.10
Xorg: 1.18.4 (11804000)
Kernel: 4.8.6.1
Mainboard: Asus Z170-P D3
Bios: 2002
GPU: GTX Titan X (Maxwell)
Sometimes the system resumes correctly. But in > 50% of the cases it shows a black screen and when I SSH, I see Xorg on 100% load. Kill Commands have no effect.
If the machine is on sleep for some hours, resumes fail in 90+ percent of all cases.
I have had another system available and moved the GPU to this temporarily to troubleshoot.
Resume from suspend has been working perfect in this setup for at least 10 resumes now with no fails and I’m tempted to conclude it will not fail in this system. Sadly, it is an old antique system so I need to move the GPU back to my modern problem system for daily work. So, I believe the GPU has no problems in itself, and the problem is not the driver.
The working system is an old Intel Core 2 platform, from now named Core, and the new problematic system is a Skylake. The Skylake system WITHOUT GPU resumes without any fails.
I now believe (expert advice welcome) that the problem is in the combination Mobo / GPU / BIOS, hopefully as a result of a BIOS setting.
Details about the platforms: Core:
Mobo: Gigabyte EP45-UD3LR
CPU: Intel Core2 Quad
PSU: Nexus 600
Intregrated graphics: No
I’m really in the dark here but I have one guess. The GPU resumes good in the old system, that I believe is slow. As naanoo writes, the GPU seems to fail more if it has been on suspend longer. Maybe the GPU discharges more and takes longer to get enough power to wakeup and respond? My guess now is that the Skylake system resumes faster, and that the GPU is not given enough time to wakeup and respond. The OS then timeouts the response from the GPU and Xorg crashed. Is this reasonable?
What speeks against your Mobo/GPU/BIOS-Thesis → on the same machine on wich I have the problem with Ubuntu 16.04 suspend/resume works fine … 100% the times … since 1+ year ongoing. Still now, when I boot the other drive.