Nvidia Laptop - Unable to suspend

Hi,
Yesterday I updated to Pop OS 22.04 (from 21.10) and now I am unable to suspend due to nvidia errors. My kernel version is 5.18.10-76051810-generic and i tried both with 470 and 510 nvidia driver versions (I first updated to 510 then rolled back to 470).

Initially I had this dmesg log during suspend: [ 0.000000] microcode: microcode updated early to revision 0xf0, date = 2021- - Pastebin.com

Then I disabled the three systemd services (hibernade, suspend, resume). I tried again to suspend but I got new errors (i was unable to get the dmesg log but it was something like “some peripherals (GPU) are still not suspended, interrupting”).

I re-enabled the three services and now I get the following errors:


Stopping disk
NVRM: GPU 0000:05:00.0: PreserveVideoMemoryAllocations module parameter is set. System Power Management attempted without driver procfs suspend interface. Please refer to the 'Configuring Power Management Support' section >
PM: pci_pm_suspend(): nv_pmops_suspend+0x0/0x20 [nvidia] returns -5
PM: dpm_run_callback(): pci_pm_suspend+0x0/0x160 returns -5
nvidia 0000:05:00.0: PM: failed to suspend async: error -5`

So i receive some -5 return values and a message immediately says “resuming from suspend” and from a different TTY session i can see the systemd nvidia-suspend.service status and it says something like “process exited with error 9/SIGNAL”. Why this happens? How can I solve it? Thank you

EDIT: this is the systemd suspend status log:

× nvidia-suspend.service - NVIDIA system suspend actions
     Loaded: loaded (/lib/systemd/system/nvidia-suspend.service; enabled; vendor preset: enabled)
     Active: failed (Result: signal) since Mon 2022-07-25 13:05:41 CEST; 43s ago
    Process: 11867 ExecStart=/usr/bin/logger -t suspend -s nvidia-suspend.service (code=exited, status=0/SUCCESS)
    Process: 11868 ExecStart=/usr/bin/nvidia-sleep.sh suspend (code=killed, signal=KILL)
   Main PID: 11868 (code=killed, signal=KILL)
        CPU: 37ms

Jul 25 13:05:41 pop-os systemd[1]: Starting NVIDIA system suspend actions...
Jul 25 13:05:41 pop-os suspend[11867]: nvidia-suspend.service
Jul 25 13:05:41 pop-os logger[11867]: <13>Jul 25 13:05:41 suspend: nvidia-suspend.service
Jul 25 13:05:41 pop-os systemd[1]: nvidia-suspend.service: Main process exited, code=killed, status=9/KILL
Jul 25 13:05:41 pop-os systemd[1]: nvidia-suspend.service: Failed with result 'signal'.
Jul 25 13:05:41 pop-os systemd[1]: Failed to start NVIDIA system suspend actions.

nvidia-bug-report.log.gz (370.2 KB)

Please try setting nvidia module option NVreg_TemporaryFilePath=/var/tmp

Thank you for the answer.

Am I supposed to add this line where? /etc/modprobe.d/nvidia-power-management.conf? Because I don’t have it.

ls /etc/modprobe.d
       alsa-base.conf                        blacklist-framebuffer.conf         intel-microcode-blacklist.conf 
       amd64-microcode-blacklist.conf        blacklist-modem.conf               iwlwifi.conf 
       blacklist-ath_pci.conf                blacklist-oss.conf                 mdadm.conf 
       blacklist.conf                        blacklist-rare-network.conf        nvidia-graphics-drivers-kms.conf 
       blacklist-firewire.conf               dkms.conf                          system76-power.conf

Just create any file there and insert
options nvidia NVreg_TemporaryFilePath=/var/tmp
then run
sudo update-initramfs -u

Ok I tried, same problem :( What it can be?


❯ cat /proc/driver/nvidia/params
ResmanDebugLevel: 4294967295
RmLogonRC: 1
ModifyDeviceFiles: 1
DeviceFileUID: 0
DeviceFileGID: 0
DeviceFileMode: 438
InitializeSystemMemoryAllocations: 1
UsePageAttributeTable: 4294967295
EnableMSI: 1
RegisterForACPIEvents: 1
EnablePCIeGen3: 0
MemoryPoolSize: 0
KMallocHeapMaxSize: 0
VMallocHeapMaxSize: 0
IgnoreMMIOCheck: 0
TCEBypassMode: 0
EnableStreamMemOPs: 0
EnableUserNUMAManagement: 1
NvLinkDisable: 0
RmProfilingAdminOnly: 1
PreserveVideoMemoryAllocations: 1
EnableS0ixPowerManagement: 0
S0ixPowerManagementVideoMemoryThreshold: 256
DynamicPowerManagement: 2
DynamicPowerManagementVideoMemoryThreshold: 200
RegisterPCIDriver: 1
EnablePCIERelaxedOrderingMode: 0
EnableGpuFirmware: 2
RegistryDwords: ""
RegistryDwordsPerDevice: ""
RmMsg: ""
GpuBlacklist: ""
TemporaryFilePath: "/var/tmp"
ExcludedGpus: ""

IDK, what kills /usr/bin/nvidia-sleep.sh with SIGKILL?

How can I see what kills the sh script?

Moreover, what you think about disabling PreserveVideoMemoryAllocations? In this way I can disable the three systemd services and see if it works

I found another thread on reddit with the same stack trice as mine: https://www.reddit.com/r/pop_os/comments/uj4gld/another_broken_install_after_kernel_517_update/

Apparently, the problem is related to the kernel 5.16/17. However, I am not able to revert back to 5.16 to test it because the update removed all the older kernel versions.

Should this bug be addressed by NVIDIA?

Actually, the mentioned backtrace is not caught in the logs, only that nvidia-suspend.sh is killed. Makes it hard to find any cause. AFAIK, kernel 5.15 should be the 22.04 stock kernel, so you should be able to return to it running
sudo apt install --install-recommends linux-generic