as far i can see - this should make sense only when lots of GPU RAM is being wired/allocated at the time of hibernate/suspend event, right? I have 8GB GPU RAM and only 1 GB is being used normally by Xorg, Chrome and some other apps.
Do you think i should try configuring second option using /proc/driver/nvidia/suspend ?
Option one, which is the default (the one you should be using right now), is limited by functionality.
As said, I’d give it a try. Won’t hurt and is easily revertible. The only thing to watch out, is to have enough space on the drive/mountpoint.
ok, so i’ve updated to most recent driver from PPA, i’ll try configuring option #2 and hibernating later today - if it’ll crash again i’ll provide another nvidia-bug-report.log.gz. Can i expect it would be analyzed by Nvidia after that?
There were some improvements I made in the 465.19.01 beta for suspend/resume with the power management stuff enabled. I know it might be difficult to test the beta if you’re using a PPA but would it be possible to give it a try?
Ah, I read the changelog, but nothing I understood about suspend/resume improvements, except the automatic installation.
For xyapus:
If you want to give it a try, make sure you purge all nvida-driver ppa files (apt purge nvidia* libnvidia*), before installing via .run file. And stop the X server before installation (i.e. systemctl isolate multi-user-target).
There was a lot of intertwined behavior around VT switches and suspend/resume that I tried to untangle for the 465 series. All of it hinges off of the NVreg_PreserveVideoMemory=1 module parameter, which is still disabled by default in most cases. The suspend/hibernate/resume systemd units are required for the video memory preservation to function, which is why I made an effort to make the installer set those up automatically. If you’re using a PPA or other distribution packages, you’ll need to check with them to determine whether those systemd services are installed or enabled by default.
So the current state of things in 465.19.01 is that if you use the .run installer on a systemd distro, the only thing you’re supposed to need to do manually is enable NVreg_PreserveVideoMemory=1.
While using 460.67-0ubuntu0~0.20.04.1 i tried manually following this Configuring Power Management Support guide and installed required systemd services. I’ve set up /tmp to use tmpfs of proper size using the /etc/systemd/system/tmp.mount so:
$ mount | grep /tmp
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,size=10485760k)
I cannot resume from hibernate when NVreg_PreserveVideoMemoryAllocations=1
i’ve also tried other TemporaryFilePath locations as the doc states that
To achieve the best performance, file system types other than tmpfs are recommended at this time.
So i changed to NVreg_TemporaryFilePath=/tmp.nvidia and created dir /tmp.nvidia - but still can’t restore from hibernation with the same error as above.
@aplattner from the changelog it’s not clear to me if anything regarding my problem has been changed in the driver itself between v460 and v465. I can see that the systemd units installation are now automated, but i’ve managed to do it manually, so do i still have to go with that beta? Honestly i’m not too comfortable with betas…
tmpfs is a temporary filesystem that resides in memory and/or swap partition(s). Mounting directories as tmpfs can be an effective way of speeding up accesses to their files, or to ensure that their contents are automatically cleared upon reboot.
Having your resume file cleared upon reboot sounds like a no go. As the file is gone.
Can you please do systemctl status nvidia-suspend nvidia-hibernate nvidia-resume to verify that the systemd services are actually enabled? Also, how did you trigger the suspend? You need to use systemctl suspend or systemctl hibernate rather than writing to /sys/power/state directly.
Data in tmpfs is included in the hibernation image that the kernel writes to the disk, so that data should still be there during a resume from hibernation. I.e. it might be slower but it should at least still work as long as tmpfs has enough space to store the contents of video memory.
That’s awfully strange – if those units are enabled then systemd-hibernate.service should have run them. Does systemctl status systemd-hibernate.service show anything about it running the nvidia ones?
Yes, quite a bit changed in v465. There were some fixes for data corruption on some GPUs and the interaction between the X server and OpenGL clients during VT switches (which happen during suspend too) was significantly simplified when NVreg_PreserveVideoMemory=1 is enabled.
Yeah, if it actually started that service during hibernate then there should be messages about it in the journal. My system isn’t set up for hibernate but this is what I get for the similar suspend path:
> systemctl status systemd-suspend
● systemd-suspend.service - Suspend
Loaded: loaded (/usr/lib/systemd/system/systemd-suspend.service; static)
Active: inactive (dead)
Docs: man:systemd-suspend.service(8)
Mar 30 23:08:49 aplattner systemd[1]: Starting Suspend...
Mar 30 23:08:49 aplattner systemd-sleep[2263539]: Suspending system...
Mar 31 07:43:38 aplattner systemd-sleep[2263539]: System resumed.
Mar 31 07:43:38 aplattner systemd[1]: systemd-suspend.service: Succeeded.
Mar 31 07:43:38 aplattner systemd[1]: Finished Suspend.
Mar 31 23:29:18 aplattner systemd[1]: Starting Suspend...
Mar 31 23:29:18 aplattner systemd-sleep[3774631]: Suspending system...
Apr 01 00:39:58 aplattner systemd-sleep[3774631]: System resumed.
Apr 01 00:39:58 aplattner systemd[1]: systemd-suspend.service: Succeeded.
Apr 01 00:39:58 aplattner systemd[1]: Finished Suspend.
I could give that beta a try later on, but at the moment i’m not sure my setup is configured correctly, so i cannot be sure that hibernate is totally non-usable with the latest PPA driver i have…
My concern about betas is that this is my main PC and crippling it with beta drivers doesn’t sound like a good idea…
If i do systemctl suspend then i can also see some log-messages as you have. These messages do not live across reboots, so i cannot 100% verify if they are there for systemctl hibernate because when i wake up from hibernate - system can’t resume and reboots, so after reboot these messages are gone