I have not seen that. Thanks for the tip, I’ll try.
The machine has on board graphics if the nvidia card should fail.
What makes this difficult for me to troubleshoot is the erratic nature of the failures. It doesn’t alway fail to resume. Sometimes it does, sometimes it don’t. I have not been able to diagnose any pattern for when it fails.
Nvidia Driver Version: 375.10
Xorg: 1.18.4 (11804000)
Kernel: 4.8.6.1
Mainboard: Asus Z170-P D3
Bios: 2002
GPU: GTX Titan X (Maxwell)
Sometimes the system resumes correctly. But in > 50% of the cases it shows a black screen and when I SSH, I see Xorg on 100% load. Kill Commands have no effect.
If the machine is on sleep for some hours, resumes fail in 90+ percent of all cases.
I have had another system available and moved the GPU to this temporarily to troubleshoot.
Resume from suspend has been working perfect in this setup for at least 10 resumes now with no fails and I’m tempted to conclude it will not fail in this system. Sadly, it is an old antique system so I need to move the GPU back to my modern problem system for daily work. So, I believe the GPU has no problems in itself, and the problem is not the driver.
The working system is an old Intel Core 2 platform, from now named Core, and the new problematic system is a Skylake. The Skylake system WITHOUT GPU resumes without any fails.
I now believe (expert advice welcome) that the problem is in the combination Mobo / GPU / BIOS, hopefully as a result of a BIOS setting.
Details about the platforms: Core:
Mobo: Gigabyte EP45-UD3LR
CPU: Intel Core2 Quad
PSU: Nexus 600
Intregrated graphics: No
I’m really in the dark here but I have one guess. The GPU resumes good in the old system, that I believe is slow. As naanoo writes, the GPU seems to fail more if it has been on suspend longer. Maybe the GPU discharges more and takes longer to get enough power to wakeup and respond? My guess now is that the Skylake system resumes faster, and that the GPU is not given enough time to wakeup and respond. The OS then timeouts the response from the GPU and Xorg crashed. Is this reasonable?
What speeks against your Mobo/GPU/BIOS-Thesis → on the same machine on wich I have the problem with Ubuntu 16.04 suspend/resume works fine … 100% the times … since 1+ year ongoing. Still now, when I boot the other drive.
The reason I’m asking about CUDA is that when I moved the GPU from the Skylake system, I purged all nvidia modules (I wasn’t thinking about it but that also included CUDA). I just moved the GPU back onto the Skylake platform, and something is different. I thought I would need to reinstall the Nvidia drivers but they were already there?! And I now have an nvidia prime icon running in my task bar htat certainly was not there before. And I have not been able to provoke a fail on resuming from suspend so far. I’m thinking CUDA might have introduced some problem maybe? Not sure so far, I’m still testing. Also, I think the tests I ran on Fedora that had the same issues and CUDA was not on that system.