/proc/driver/nvidia/suspend interface not working

Hi,

I’m looking at a problem where after suspend/resume, the GNOME desktop background has a white tint.

Issues like this might well be explained by the memory management challenges described on https://download.nvidia.com/XFree86/Linux-x86_64/430.09/README/powermanagement.html so I’m trying it.

Unfortunately the system fails to go into suspend at all when I run “systemctl suspend”. When I run “echo mem > /sys/power/state” manually it hangs on that command. There are no errors logged, the last message in the kernel log is:

[  354.515975] PM: suspend entry (deep)
[  354.518018] Filesystems sync: 0.002 seconds

Stepping back a little, I appreciate the attempt at documenting the challenges here and providing a workaround, but I do wonder if we can do better.

Finding the Xorg log via /proc open file descriptors, then finding the VT that X is on, is rather unclean. I shudder thinking about putting this into production. Doesn’t the kernel do a VT switch on suspend/resume anyway, why do we need another one here?

What’s the specific kernel-side challenge that can’t be handled with code in the suspend routine, that can be handled by a procfs handler?

How do other GPUs deal with such challenges? Even closer to home, how does nouveau cope?

Separately, I’ve noticed that if I set NVreg_PreserveVideoMemoryAllocations=1 and don’t setup any of the stuff that interacts with /proc, then the bug goes away. Presumably it’s managing to back up all the video memory in the normal suspend routine.

What are the implications of turning on this option without setting up the disk-backed gfx memory backup? What happens if the system isn’t able to preserve all the memory, does it cause suspend to fail, or will it suspend anyway and hope for the best during resume?

The nvidia provided script is crap, it doesn’t even work if two xservers (Gnome) are running and doesn’t really check if an xserver is running on the current vt. I exchanged that whole /proc grepping by a simple fgconsole call and then it works. Unless an Xserver is running on vt63.