(SOLVED) resume from suspend not working with 980 Ti, drivers 352 - 370, kernels 3.16 - 4.4

Have you seen this?

Post #3

“Barteks2x, I have the same laptop, you have to add acpi_osi=”!Windows 2013" to kernel command line for suspend/resume to work for kernels >3.14"

Struggling with my Geforce GT 635M and the nvidia Linux driver. - NVIDIA Developer Forums
[url]https://devtalk.nvidia.com/default/topic/955952/linux/struggling-with-my-geforce-gt-635m-and-the-nvidia-linux-driver-/[/url]

I have not seen that. Thanks for the tip, I’ll try.

The machine has on board graphics if the nvidia card should fail.

What makes this difficult for me to troubleshoot is the erratic nature of the failures. It doesn’t alway fail to resume. Sometimes it does, sometimes it don’t. I have not been able to diagnose any pattern for when it fails.

Have you tried clearing the CMOS since the hit & miss resume from suspend behavior began?

Asus is pretty diligent about releasing new UEFI / BIOS updates. You may want to check in once per month for any new ones.

Exactly the same problem here with:

Nvidia Driver Version: 375.10
Xorg: 1.18.4 (11804000)
Kernel: 4.8.6.1
Mainboard: Asus Z170-P D3
Bios: 2002
GPU: GTX Titan X (Maxwell)

Sometimes the system resumes correctly. But in > 50% of the cases it shows a black screen and when I SSH, I see Xorg on 100% load. Kill Commands have no effect.

If the machine is on sleep for some hours, resumes fail in 90+ percent of all cases.

nvidia-bug-report:
http://www.naanoo.com/upstream/nvidia-bug-report.log.gz

Tried adding acpi_osi=“!Windows 2013” to command line. No perceived change.

I have had another system available and moved the GPU to this temporarily to troubleshoot.
Resume from suspend has been working perfect in this setup for at least 10 resumes now with no fails and I’m tempted to conclude it will not fail in this system. Sadly, it is an old antique system so I need to move the GPU back to my modern problem system for daily work. So, I believe the GPU has no problems in itself, and the problem is not the driver.

The working system is an old Intel Core 2 platform, from now named Core, and the new problematic system is a Skylake. The Skylake system WITHOUT GPU resumes without any fails.

I now believe (expert advice welcome) that the problem is in the combination Mobo / GPU / BIOS, hopefully as a result of a BIOS setting.

Details about the platforms:
Core:
Mobo: Gigabyte EP45-UD3LR
CPU: Intel Core2 Quad
PSU: Nexus 600
Intregrated graphics: No

Skylake:
Mobo: Asus Z170M-PLUS
CPU: Intel Core i7-6700
PSU: Corsair RM650i
Integrated graphics: yes (unused)

I’m really in the dark here but I have one guess. The GPU resumes good in the old system, that I believe is slow. As naanoo writes, the GPU seems to fail more if it has been on suspend longer. Maybe the GPU discharges more and takes longer to get enough power to wakeup and respond? My guess now is that the Skylake system resumes faster, and that the GPU is not given enough time to wakeup and respond. The OS then timeouts the response from the GPU and Xorg crashed. Is this reasonable?

The Skylake system runs the latest BIOS and CMOS has been cleared as instructed. Still failing.

I ssh’d into the Skylake system when Xorg had crashed on resume.
The GPU shows up in lspci. Does this mean that it is awake and responding correctly?

@ JonathanAnderson

What speeks against your Mobo/GPU/BIOS-Thesis → on the same machine on wich I have the problem with Ubuntu 16.04 suspend/resume works fine … 100% the times … since 1+ year ongoing. Still now, when I boot the other drive.

What I am sure of:

On my machine it has nothing to do with:

  • audio interface
  • other pcie cards
  • usb devices
  • the monitors
  • hard/ssd-drives

… I am testing for 2 oder 3 weeks already ;-)

Sorry naanoo, I did not understand this.
Do you mean that you are DUAL booting, and that the problem is only with Ubuntu, not with Windows?

naanoo, do you have CUDA installed?

I have two SSDs:

  1. Arch Linux
  2. Ubuntu Gnome 16.04

With Ubuntu there are no problems suspending / resuming.

Yes, I have CUDA installed.

You have problems with Ubuntu?
No problems with Arch?

Do you have CUDA installed in both?

I have the same (or similar) problems with Fedora and Mint Cinnamon (Ubuntu 16.04).

No, the other way round.

Ubuntu 16.04 → resume works
Arch → black screen problem

Do you have CUDA installed in both?

I am not 100% certain, my Ubuntu Installation also has CUDA. I will have to take a look.

The reason I’m asking about CUDA is that when I moved the GPU from the Skylake system, I purged all nvidia modules (I wasn’t thinking about it but that also included CUDA). I just moved the GPU back onto the Skylake platform, and something is different. I thought I would need to reinstall the Nvidia drivers but they were already there?! And I now have an nvidia prime icon running in my task bar htat certainly was not there before. And I have not been able to provoke a fail on resuming from suspend so far. I’m thinking CUDA might have introduced some problem maybe? Not sure so far, I’m still testing. Also, I think the tests I ran on Fedora that had the same issues and CUDA was not on that system.

Mmmmhhhh … I can not boot the Ubuntu-Drive at the moment, because I have some important jobs running on Arch.

But I can mount the drive. Do you know the filesystem-paths, where I can look for a CUDA installation?

Okay. I scanned the drive.

The Ubuntu Installation has NO CUDA installation.

Good! You’re gaining on it!

Perhaps a fresh OS install on a spare drive would shed further light on the why of what is going on?

BTW. How are you partitioning your Skylake’s drives?

I do four partitions on a GPT drive (CSM / Compatibility Support Module, no ‘Secure Boot’):

  • bios_grub = 1MB

  • / = (1048MB x 40 or 40GB, ext4)

  • swap = 1GB more than the m/b’s max. RAM capacity (1024MB x 33 or 33GB) to ensure a functioning resume from suspend

  • /home (ext4) = the rest of the drive.