Problems resuming from suspend with GTX 1060 in Fedora 27

Suspend works for a few times and then I get a black screen with a frozen mouse pointer. I am able to SSH into the machine and create a bug report, but I am unable to reboot the machine without using the ALT-SYSRQ REISUB key combination.

I am using the latest rpmFusion drivers for the card, kmod-nvidia-4.14.6-300.fc27.x86_64-387.22-1.fc27.x86_64 and akmod-nvidia-387.22-1.fc27.x86_64

Xorg is xorg-x11-server-Xorg-1.19.5-1.fc27.x86_64

I will attach the report here. I am not sure how to get more detailed reports from X because I am running it from systemd. I will do so if someone tells me how.

I hope everyone is having a good winter holiday.

Thanks!

nvidia-bug-report.log.gz (76.4 KB)

Dec 24 05:28:25 rigel kernel: NVRM: Xid (PCI:0000:01:00): 56, CMDre 00000000 00000080 00000000 00000005 00001005
Dec 24 05:28:25 rigel kernel: NVRM: Xid (PCI:0000:01:00): 56, CMDre 00000001 00000080 00000000 00000005 0000000d
Dec 24 05:28:25 rigel kernel: NVRM: Xid (PCI:0000:01:00): 56, CMDre 00000001 00000080 00000000 00000005 0000000d

This is the same error I have been seeing since October.
I truely hope NVIDIA takes a look now that there is more than one person reporting this.

Running kernel 4.14.8-300.fc27.x86_64 now. Kernel modules are kmod-nvidia-4.14.8-300.fc27.x86_64-387.34-1.fc27.x86_64 akmod-nvidia-387.34-1.fc27.x86_64, again from rpmFusion.

Xorg is xorg-x11-server-Xorg-1.19.5-1.fc27.x86_64

Symptoms are nearly the same as before but I had a lock screen showing with the clock and the mouse wasn’t frozen. I was unable to move forward again. sudo reboot failed and I had to ALT-SYSRQ-REISUB to reboot.

The report is attached. Is there more information needed? I’m not sure what else to post here.

Thanks!

nvidia-bug-report.log.gz (94.4 KB)

@peregine, did this start with gnome 3.26.1?
I’m pretty sure I never saw Xid 56 before gnome 3.26.1
Maybe nvidia driver doesn’t like the new gnome release?

Does the error go away if you run “/usr/bin/nvidia-smi -pm 1” after logging into gnome after a reboot (with nvidia-persistenced daemon disabled)?

@HussamT

The problem with resuming from suspend has been with me since 4.13.14, I believe. So at least a month. Initially, I thought I was having a problem with the kernel but then found I’ve been having it sporadically every since. By sporadically, I mean I have a problem after about 5 or 6 suspend/resume cycles, but usually more than that. So pretty stable really, but I can’t reliably leave any programs open when I suspend.

rpm -qa gives gnome-shell-3.26.2-3.fc27.x86_64 as the version I have currently, so it looks like i’m a click beyond 3.26.1.

Am I posting in the right forum? I’m not really a developer. I do code a little bit but I don’t wear the uniform (no 1 L bottles of energy drink on my desk). I will give your advice a try but I don’t know if I can give a good answer since my problem only occurs every week or so.

Same here, but I am thinking two heads are better than one. So we may both benefit from comparing notes.
Yes. It is very sporadic. It happens only a few times a month here.
In any case, try running that as root (maybe using sudo). I’m doing the same. We’ll see if we get some similar results.

peregrine, check if using kernel parameter nvidia-drm.modeset=1 works around the issue.

@generix: using that kernel parameter makes the login screen on X very difficult to use. The input from the mouse and keyboard has much lag and stutters. I use “Gnome on Xorg” but it still has issues at boot.

Try the workaround for that:
https://devtalk.nvidia.com/default/topic/957814/linux/prime-and-prime-synchronization/post/5226294/#5226294

Even on desktops with onboard intel graphics disabled, nvidia-drm.modeset=1 breaks gdm with mutter >= 3.26.1 so that’s not an option.

Okay, I loaded the cuda libraries and tried that command. It took about a month, like you said, between the first two crashes, but I crashed again after I posted last time (just got mad and rebooted - no report), so maybe it won’t take a month this time. I’m ready to fire off that report for what it’s worth when it happens again.

If it still shows Xid 56 with persistence mode, then trying it again won’t help so feel free to ignore my suggestion.
I was hoping it would fix that error. Thanks for checking.

This is the first time I tried it actually. The crash happened before your suggestion. So I guess we’ll see what happens?

Alright, thank you. Let’s hope for the best.

Well, I had a few suspends and resumes this time and then another crash just now. Here’s the report again.
nvidia-bug-report.log.gz (128 KB)

I saw the Xid 56 error today for the first time since the 22nd of December.
I sshed into the machine, killall Xorg, modprobe -r all nvidia modules, modprobed them again, and started GDM again. So this is more or less recoverable from.
System is running well and nvidia reports the GPU temperature is 28 Celcius.

I have a GTX750i in a Dell 9010 machine, and resume after suspend never works on my setup.

Running latest FC27. I had to use an HDMI cable to attach to my Dell P2715Q instead of a displayport cable, otherwise I get no picture at all. This is with running wayland with an updated mutter.
nvidia-bug-report.log.gz (143 KB)