Ubuntu 20.04 with nvidia-460 driver freezes randomly after resume from suspend/hibernate

tmpfs is a temporary filesystem that resides in memory and/or swap partition(s). Mounting directories as tmpfs can be an effective way of speeding up accesses to their files, or to ensure that their contents are automatically cleared upon reboot.

Having your resume file cleared upon reboot sounds like a no go. As the file is gone.

Can you please do systemctl status nvidia-suspend nvidia-hibernate nvidia-resume to verify that the systemd services are actually enabled? Also, how did you trigger the suspend? You need to use systemctl suspend or systemctl hibernate rather than writing to /sys/power/state directly.

Data in tmpfs is included in the hibernation image that the kernel writes to the disk, so that data should still be there during a resume from hibernation. I.e. it might be slower but it should at least still work as long as tmpfs has enough space to store the contents of video memory.

$ systemctl status nvidia-suspend nvidia-hibernate nvidia-resume
● nvidia-suspend.service - NVIDIA system suspend actions
     Loaded: loaded (/etc/systemd/system/nvidia-suspend.service; enabled; vendor preset: enabled)
     Active: inactive (dead)

● nvidia-hibernate.service - NVIDIA system hibernate actions
     Loaded: loaded (/etc/systemd/system/nvidia-hibernate.service; enabled; vendor preset: enabled)
     Active: inactive (dead)

● nvidia-resume.service - NVIDIA system resume actions
     Loaded: loaded (/etc/systemd/system/nvidia-resume.service; enabled; vendor preset: enabled)
     Active: inactive (dead)

And yes i use sudo systemctl hibernate.

I tried both configurations - with tmpfs and without tmpfs with no luck

Ah ok, little weird to wrap your head around, but I take it ;-)

That’s awfully strange – if those units are enabled then systemd-hibernate.service should have run them. Does systemctl status systemd-hibernate.service show anything about it running the nvidia ones?

Nope, should it?

$ sudo systemctl status systemd-hibernate.service
● systemd-hibernate.service - Hibernate
 Loaded: loaded (/lib/systemd/system/systemd-hibernate.service; static; vendor preset: enabled)
 Active: inactive (dead)
   Docs: man:systemd-suspend.service(8)

Yes, quite a bit changed in v465. There were some fixes for data corruption on some GPUs and the interaction between the X server and OpenGL clients during VT switches (which happen during suspend too) was significantly simplified when NVreg_PreserveVideoMemory=1 is enabled.

Yeah, if it actually started that service during hibernate then there should be messages about it in the journal. My system isn’t set up for hibernate but this is what I get for the similar suspend path:

> systemctl status systemd-suspend
● systemd-suspend.service - Suspend
     Loaded: loaded (/usr/lib/systemd/system/systemd-suspend.service; static)
     Active: inactive (dead)
       Docs: man:systemd-suspend.service(8)

Mar 30 23:08:49 aplattner systemd[1]: Starting Suspend...
Mar 30 23:08:49 aplattner systemd-sleep[2263539]: Suspending system...
Mar 31 07:43:38 aplattner systemd-sleep[2263539]: System resumed.
Mar 31 07:43:38 aplattner systemd[1]: systemd-suspend.service: Succeeded.
Mar 31 07:43:38 aplattner systemd[1]: Finished Suspend.
Mar 31 23:29:18 aplattner systemd[1]: Starting Suspend...
Mar 31 23:29:18 aplattner systemd-sleep[3774631]: Suspending system...
Apr 01 00:39:58 aplattner systemd-sleep[3774631]: System resumed.
Apr 01 00:39:58 aplattner systemd[1]: systemd-suspend.service: Succeeded.
Apr 01 00:39:58 aplattner systemd[1]: Finished Suspend.

I could give that beta a try later on, but at the moment i’m not sure my setup is configured correctly, so i cannot be sure that hibernate is totally non-usable with the latest PPA driver i have…

My concern about betas is that this is my main PC and crippling it with beta drivers doesn’t sound like a good idea…

If i do systemctl suspend then i can also see some log-messages as you have. These messages do not live across reboots, so i cannot 100% verify if they are there for systemctl hibernate because when i wake up from hibernate - system can’t resume and reboots, so after reboot these messages are gone

I can see that, sure.

It’s good to know that you’re at least seeing the log messages from suspend. Is suspend & resume working correctly for you and it’s just hibernate that’s not working?

If possible, it might be useful to run journalctl -f & from an SSH session before triggering hibernate. If there’s something going wrong during the hibernation phase then maybe you’d see it that way. If the problem is occurring during resume then it’s a little tougher – you might be able to enable verbose logging on the kernel command line somewhere to see if there are any errors on the console, but you won’t be able to see them on the SSH connection that way.

Edit: Oh, I guess the messages in your earlier comment are from the resume phase. It’s strange that the nvidia kernel module doesn’t think it was suspended with the procfs/systemd interface there.

Ok, forum blocked me from replying earlier. Thanks for the directions - i’ll try again later with the different options you mention and get back to you with the results.

Yes message log is from the resume stage, i’ll try SSH session method to see if there’re any errors during hibernation phase.

So i tried different options but cannot get hibernation work. Here’s log of suspend - i can see nvidia-hibernate.service is being called for sure and succeeds:

Apr 04 12:30:21 gingerblade systemd[1]: Reached target Sleep.
Apr 04 12:30:21 gingerblade systemd[1]: Starting NVIDIA system hibernate actions...
Apr 04 12:30:21 gingerblade hibernate[8060]: nvidia-hibernate.service
Apr 04 12:30:21 gingerblade logger[8060]: <13>Apr  4 12:30:21 hibernate: nvidia-hibernate.service
Apr 04 12:30:21 gingerblade systemd[1]: nvidia-hibernate.service: Succeeded.
Apr 04 12:30:21 gingerblade systemd[1]: Finished NVIDIA system hibernate actions.
Apr 04 12:30:21 gingerblade systemd[1]: Starting Hibernate...
Apr 04 12:30:21 gingerblade kernel: PM: Image not found (code -22)
Apr 04 12:30:21 gingerblade systemd-sleep[8071]: Suspending system...

Then in the resume stage i only get same error - nvidia breaks resume and system boots fresh boot:

Apr 04 12:31:41 gingerblade kernel: PM: Image signature found, resuming
Apr 04 12:31:41 gingerblade kernel: PM: hibernation: resume from hibernation
Apr 04 12:31:41 gingerblade kernel: Freezing user space processes ... (elapsed 0.001 seconds) done.
Apr 04 12:31:41 gingerblade kernel: OOM killer disabled.
Apr 04 12:31:41 gingerblade kernel: Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
Apr 04 12:31:41 gingerblade kernel: PM: hibernation: Marking nosave pages: [mem 0x00000000-0x00000fff]
Apr 04 12:31:41 gingerblade kernel: PM: hibernation: Marking nosave pages: [mem 0x0005e000-0x0005efff]
Apr 04 12:31:41 gingerblade kernel: PM: hibernation: Marking nosave pages: [mem 0x000a0000-0x000fffff]
Apr 04 12:31:41 gingerblade kernel: PM: hibernation: Marking nosave pages: [mem 0x7c928000-0x7c928fff]
Apr 04 12:31:41 gingerblade kernel: PM: hibernation: Marking nosave pages: [mem 0x7c948000-0x7c948fff]
Apr 04 12:31:41 gingerblade kernel: PM: hibernation: Marking nosave pages: [mem 0x7cf48000-0x7cf61fff]
Apr 04 12:31:41 gingerblade kernel: PM: hibernation: Marking nosave pages: [mem 0x7f15b000-0x7f15bfff]
Apr 04 12:31:41 gingerblade kernel: PM: hibernation: Marking nosave pages: [mem 0x82a8e000-0x85c4dfff]
Apr 04 12:31:41 gingerblade kernel: PM: hibernation: Marking nosave pages: [mem 0x85c4f000-0xffffffff]
Apr 04 12:31:41 gingerblade kernel: PM: hibernation: Basic memory bitmaps created
Apr 04 12:31:41 gingerblade kernel: PM: Using 3 thread(s) for decompression
Apr 04 12:31:41 gingerblade kernel: PM: Loading and decompressing image data (1130381 pages)...
Apr 04 12:31:41 gingerblade kernel: PM: Image loading progress:   0%
Apr 04 12:31:41 gingerblade kernel: PM: Image loading progress:  10%
Apr 04 12:31:41 gingerblade kernel: PM: Image loading progress:  20%
Apr 04 12:31:41 gingerblade kernel: PM: Image loading progress:  30%
Apr 04 12:31:41 gingerblade kernel: PM: Image loading progress:  40%
Apr 04 12:31:41 gingerblade kernel: PM: Image loading progress:  50%
Apr 04 12:31:41 gingerblade kernel: PM: Image loading progress:  60%
Apr 04 12:31:41 gingerblade kernel: PM: Image loading progress:  70%
Apr 04 12:31:41 gingerblade kernel: PM: Image loading progress:  80%
Apr 04 12:31:41 gingerblade kernel: PM: Image loading progress:  90%
Apr 04 12:31:41 gingerblade kernel: PM: Image loading progress: 100%
Apr 04 12:31:41 gingerblade kernel: PM: Image loading done
Apr 04 12:31:41 gingerblade kernel: PM: hibernation: Read 4521524 kbytes in 3.90 seconds (1159.36 MB/s)
Apr 04 12:31:41 gingerblade kernel: PM: Image successfully loaded
Apr 04 12:31:41 gingerblade kernel: printk: Suspending console(s) (use no_console_suspend to debug)
Apr 04 12:31:41 gingerblade kernel: NVRM: GPU 0000:01:00.0: PreserveVideoMemoryAllocations module parameter is set. System Power Management attempted without driver procfs suspend interface. Please refer to the>
Apr 04 12:31:41 gingerblade kernel: PM: pci_pm_freeze(): nv_pmops_freeze+0x0/0x20 [nvidia] returns -5
Apr 04 12:31:41 gingerblade kernel: PM: dpm_run_callback(): pci_pm_freeze+0x0/0xc0 returns -5
Apr 04 12:31:41 gingerblade kernel: PM: Device 0000:01:00.0 failed to quiesce async: error -5
Apr 04 12:31:41 gingerblade kernel: nvme nvme0: 12/0/0 default/read/poll queues
Apr 04 12:31:41 gingerblade kernel: fbcon: Taking over console
Apr 04 12:31:41 gingerblade kernel: Console: switching to colour frame buffer device 240x67
Apr 04 12:31:41 gingerblade kernel: PM: hibernation: Failed to load image, recovering.

I tried with & without tmpfs mount for /tmp folder - it makes no difference

Is there anything else i can provide you with? Any logs that might be helpful?

Is this log still with the release 460 drivers? I haven’t had a chance to set my system back up for hibernate to try the 460 drivers yet to see if I can reproduce this, sorry.

I’ve tried latest beta today, but i unfortunately couldn’t install it.
I first removed existing drivers with apt remove and then proceeded installing beta by running: sudo ./NVIDIA-Linux-x86_64-465.19.01.run, but installation fails. I attach /var/log/nvidia-installer.log file

nvidia-installer.log (3.3 KB)

@xyapus I was wondering if you have managed to fix the issue? I’m running into a similar issue after upgrading to 460.

With 460 and 465 versions of the driver, I cannot even see the login screen on my Ubuntu 20.04. I was able to figure out the cause was Nvidia drivers and remove them in the recovery mode. After the removal of the drivers, I can now boot my Ubuntu without any issues. However, the open-source driver is nowhere near the Nvidia drivers. So, the performance is really bad.

Although the topic is about the freezes happening after resuming from the sleep mode, I cannot even boot my Ubuntu with the 460 and 465 versions of the driver. The issue happens when booting the PC normally. I am using RTX 2060 Super.

Edit: It looks like my system upgraded the drivers from 460.73.01 to 460.80. 460.73.01 was working without issues. So, it must be related to the changes made in 460.80.

Edit 2: I downloaded 460.73.01 from launchpad and manually installed it. My system boots without issues now. I hope this gets fixed in the next release.

1 Like

Same problem for me screen feeze and reboot randomly with ubuntu 20, nvidia gtx 1660 TI and driver 460.80

I may join this discussion too. I’m on Ubuntu 20.04, running nvidia-drivers at 460.80

❯ systemctl status nvidia-suspend nvidia-hibernate nvidia-resume                                                                                                                                                                      16:25:09 
● nvidia-suspend.service
     Loaded: masked (Reason: Unit nvidia-suspend.service is masked.)
     Active: inactive (dead)

● nvidia-hibernate.service
     Loaded: masked (Reason: Unit nvidia-hibernate.service is masked.)
     Active: inactive (dead)

● nvidia-resume.service
     Loaded: masked (Reason: Unit nvidia-resume.service is masked.)
     Active: inactive (dead)

❯ systemctl status systemd-suspend                                                                                                                                                                                                    16:26:51 
● systemd-suspend.service - Suspend
     Loaded: loaded (/lib/systemd/system/systemd-suspend.service; static; vendor preset: enabled)
     Active: inactive (dead)
       Docs: man:systemd-suspend.service(8)

I cannot suspend the system running systemctl suspend. Here attached you can find the journal log right after issuing the system command and the nvidia bug report.

As an update, it appears that with 465.27 I can suspend the system with no issues.
journal.log (35.3 KB)
nvidia-bug-report.log.gz (426.1 KB)

1 Like