resume from suspend freezes system (GTX 970, Arch Linux, Kernel 4.4/4.7, NVIDIA 370)

just switched to that graphics card a week ago - haven’t used this card with another (older) driver.

@JonathanAnderson: sorry, haven’t seen your post until now… I’ve a haswell setup (H97 board and i5-4690)

I am suffering from the same issue after upgrading to the 370 release.
I just finished downgrading to the 367.35 release, and so far so good; I am able to both suspend the entire system as well as use DPMS and the display wakes up without issue in both instances.

Unfortunately it seems like DPMS breaks with almost every other driver release, I can only assume due to the ongoing KMS work.

Tip for those willing to downgrade on Arch: I needed to downgrade my kernel release to 4.7.1-1 as well.

Do you mean this issue is specific to kernel 4.7.2-1-ARCH ? just downgrading nvidia driver will not resolve the issue?

No sorry for the confusion. I had to downgrade the kernel alongside the Nvidia driver just to get the old driver working properly. Xorg and the Nvidia driver did not work at all before I downgraded my kernel as well. I am not sure whether it was an issue with the Nvidia 367.35 not liking kernel 4.7.2, or vice versa.

OS: Ubuntu 16.04
Kernel: 4.4.0-36-generic
NVIDIA driver: 367.35
NVIDIA card: GeForce GTX 970
Monitors: Samsung S24D391, Samsung SMBX2335, Acer S22OHQL

I’m experiencing the exact same behavior described in the original post. Happening erratically.

This issue is still present in 370.28 using the same specs as specified above.

Hi devs & users

Voilà what i have :

Desktop PC Ubuntu mate
kernel 4.3 rc1
xorg 1.18
nvidia-370.28 GT-610

nouveau is blacklisted .
suspend not working correctly , the LED is illuminated and and PC won’t resume after pushing LED button .

PS : it is not amazing to switch off the PC 5 or 8 times in a day .

GOD MAY HELP TO FIND A FINAL SOLUTION 4 THIS PROBLEM …

AMEN

Make sure to blacklist nouveau driver while using nvidia driver. You can add Nouveau Driver in /etc/modprobe.d/blacklist.conf file. OR create file like /etc/modprobe.d/disable-nouveau.conf with below entries
blacklist nouveau
options nouveau modeset=0

And add kernel parameter : vga=0 rdblacklist=nouveau nouveau.modeset=0
Reboot

Hi all, Please attach nvidia bug report as soon as issue reproduced and also reproduction steps, What desktop env you are running kde, gnome, unity, mate etc…?, Is the issue reproduce with bare X ? [you are start bare X with xinit or X or Xorg command] , What nvidia related error did you see when issue reproduce in log? Is the issue with DP or DVI or HDMI monitors ?

Great to see some action on this.

Have you reported the bug anywhere else?
launchpad?
bugzilla.kernel.org?

@sandipt Yes, I tried that since I’ve noticed that nouveau was loaded for some reason although it was blacklisted by archlinux’s nvidia-dkms package. However, after using the kernel parameters, nouveau is completely blacklisted. This didn’t solved the freezes though.

The reproduction is very easy: Simply suspend your system (“systemctl suspend”) and wake it it.
This makes the whole system unresponsive, hence it’s probably hard to attach any logs (Hard shutdown).

I am using Gnome 3.20 with Xorg (Starting from xinit/startx; Linux 4.7.3-ARCH; NVIDIA 370.28; GTX 1060 6GB) on a DP monitor.

Well, I am not sure why you guys can’t reproduce this issue as you can see, clearly there are lot’s of people having this exact same problem (And it’s quite depressing that I had to switch back to intel graphics because this simply sucks ;)).

That said, I am happy to help further.

Hi Protoss1, Mounir, eyalzek,

Can I get nvidia bug report as soon as issue reproduced on your setup? Please note all “resume from suspend freezes system” issue can’t be have same root cause. make sure you get below error messages in log or dmesg :

[ 631.393127] nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000957d:0:0:0x00000040
[ 635.392711] nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000917e:0:0:0x00000001
[ 639.392399] nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000927c:0:0:0x00000001

Hi all, Any earlier or latest driver have not affected with this issue?

Hi sandip
This error appears.and I would to notice that CTL+alt+F7(F8) do not bring back to GUI .it remains black unresponsive…maybe something is wrong in nv.c with the IRQ threading .
Regards
Mounir

Hi Mounir, Can I get nvidia bug report as soon as issue reproduced on your setup?

cannot access to deskto via ssh , when this is suspended .how i can launch the bug report script ?

I just managed to reproduce my issue, after roughly a week of uptime and sleeping/resuming without issue.

Linux kernel 4.7.2-1-ARCH
Nvidia driver version 370.28-1

My machine did not fully lock up thankfully, so I was able to SSH in and run nvidia-bug-report.sh: https://es.gy/d/nvidia-bug-report.log.gz

Please let me know if I could be of further assistance, I’d love to find a solution.

How hard can it be to reproduce the problem really? You are a hardware company. Put together a system with the mentioned hardware. You need to buy a motherboard and a CPU, but chances are 95% that you already own what is needed. The you slot in one of your own GPU, most from the 9XX and 10XX families seem to crash the system. Install linux, Mint / Ubuntu / Fedora (probably any linux) with recent kernels with your recent proprietary driver crashes it.

We are many suffering from this and would be happy to troubleshoot further.
What I’d like to know, to decide if there is anything that can be done tweak-wise, is which systems does the suspend - resume with these GPUs work for?
Are 100% of these GPUs suffering from the issue or are we just 10% with bad hardware combinations?
Is it Skylake? Can I make it work by just switching to an older Mobo?

What does people know?

Which bugs are these listed as in kernel bugzilla / launchpad / redhat bugzilla?

Does anyone run linux with working suspend resume with Nvidia 9XX / 10XX? Please chime in.

@sandipt: I am running my system with the suggestions from #12 for 10 days now without any problems

I managed to replicate the issue again today. Maybe I need to start a new thread for my issue as my system seems to be responsive now, it’s just the display driver crashing?

My new bug report: https://es.gy/d/nvidia-bug-report.log.0923.gz
I also found that the driver doesn’t appear to crash immediately, but if I let the system sit long enough with my monitor cycling in and out of suspend (like it’s about to start displaying things again), dmesg eventually gives the following: https://es.gy/p/nvidia-dmesg

Really eager to do anything necessary to help resolve this.

happened again:
http://s000.tinyupload.com/index.php?file_id=02495089438692938813

Tracking this issue under internal bug 200238391
I think most of ARClinux users are affected with this kernel.