Hi,
Yesterday I updated to Pop OS 22.04 (from 21.10) and now I am unable to suspend due to nvidia errors. My kernel version is 5.18.10-76051810-generic and i tried both with 470 and 510 nvidia driver versions (I first updated to 510 then rolled back to 470).
Initially I had this dmesg log during suspend: [ 0.000000] microcode: microcode updated early to revision 0xf0, date = 2021- - Pastebin.com
Then I disabled the three systemd services (hibernade, suspend, resume). I tried again to suspend but I got new errors (i was unable to get the dmesg log but it was something like “some peripherals (GPU) are still not suspended, interrupting”).
I re-enabled the three services and now I get the following errors:
Stopping disk
NVRM: GPU 0000:05:00.0: PreserveVideoMemoryAllocations module parameter is set. System Power Management attempted without driver procfs suspend interface. Please refer to the 'Configuring Power Management Support' section >
PM: pci_pm_suspend(): nv_pmops_suspend+0x0/0x20 [nvidia] returns -5
PM: dpm_run_callback(): pci_pm_suspend+0x0/0x160 returns -5
nvidia 0000:05:00.0: PM: failed to suspend async: error -5`
So i receive some -5 return values and a message immediately says “resuming from suspend” and from a different TTY session i can see the systemd nvidia-suspend.service status and it says something like “process exited with error 9/SIGNAL”. Why this happens? How can I solve it? Thank you
EDIT: this is the systemd suspend status log:
× nvidia-suspend.service - NVIDIA system suspend actions
Loaded: loaded (/lib/systemd/system/nvidia-suspend.service; enabled; vendor preset: enabled)
Active: failed (Result: signal) since Mon 2022-07-25 13:05:41 CEST; 43s ago
Process: 11867 ExecStart=/usr/bin/logger -t suspend -s nvidia-suspend.service (code=exited, status=0/SUCCESS)
Process: 11868 ExecStart=/usr/bin/nvidia-sleep.sh suspend (code=killed, signal=KILL)
Main PID: 11868 (code=killed, signal=KILL)
CPU: 37ms
Jul 25 13:05:41 pop-os systemd[1]: Starting NVIDIA system suspend actions...
Jul 25 13:05:41 pop-os suspend[11867]: nvidia-suspend.service
Jul 25 13:05:41 pop-os logger[11867]: <13>Jul 25 13:05:41 suspend: nvidia-suspend.service
Jul 25 13:05:41 pop-os systemd[1]: nvidia-suspend.service: Main process exited, code=killed, status=9/KILL
Jul 25 13:05:41 pop-os systemd[1]: nvidia-suspend.service: Failed with result 'signal'.
Jul 25 13:05:41 pop-os systemd[1]: Failed to start NVIDIA system suspend actions.
nvidia-bug-report.log.gz (370.2 KB)
Please try setting nvidia module option NVreg_TemporaryFilePath=/var/tmp
Thank you for the answer.
Am I supposed to add this line where? /etc/modprobe.d/nvidia-power-management.conf? Because I don’t have it.
ls /etc/modprobe.d
alsa-base.conf blacklist-framebuffer.conf intel-microcode-blacklist.conf
amd64-microcode-blacklist.conf blacklist-modem.conf iwlwifi.conf
blacklist-ath_pci.conf blacklist-oss.conf mdadm.conf
blacklist.conf blacklist-rare-network.conf nvidia-graphics-drivers-kms.conf
blacklist-firewire.conf dkms.conf system76-power.conf
Just create any file there and insert
options nvidia NVreg_TemporaryFilePath=/var/tmp
then run
sudo update-initramfs -u
Ok I tried, same problem :( What it can be?
❯ cat /proc/driver/nvidia/params
ResmanDebugLevel: 4294967295
RmLogonRC: 1
ModifyDeviceFiles: 1
DeviceFileUID: 0
DeviceFileGID: 0
DeviceFileMode: 438
InitializeSystemMemoryAllocations: 1
UsePageAttributeTable: 4294967295
EnableMSI: 1
RegisterForACPIEvents: 1
EnablePCIeGen3: 0
MemoryPoolSize: 0
KMallocHeapMaxSize: 0
VMallocHeapMaxSize: 0
IgnoreMMIOCheck: 0
TCEBypassMode: 0
EnableStreamMemOPs: 0
EnableUserNUMAManagement: 1
NvLinkDisable: 0
RmProfilingAdminOnly: 1
PreserveVideoMemoryAllocations: 1
EnableS0ixPowerManagement: 0
S0ixPowerManagementVideoMemoryThreshold: 256
DynamicPowerManagement: 2
DynamicPowerManagementVideoMemoryThreshold: 200
RegisterPCIDriver: 1
EnablePCIERelaxedOrderingMode: 0
EnableGpuFirmware: 2
RegistryDwords: ""
RegistryDwordsPerDevice: ""
RmMsg: ""
GpuBlacklist: ""
TemporaryFilePath: "/var/tmp"
ExcludedGpus: ""
IDK, what kills /usr/bin/nvidia-sleep.sh with SIGKILL?
How can I see what kills the sh script?
Moreover, what you think about disabling PreserveVideoMemoryAllocations? In this way I can disable the three systemd services and see if it works
I found another thread on reddit with the same stack trice as mine: https://www.reddit.com/r/pop_os/comments/uj4gld/another_broken_install_after_kernel_517_update/
Apparently, the problem is related to the kernel 5.16/17. However, I am not able to revert back to 5.16 to test it because the update removed all the older kernel versions.
Should this bug be addressed by NVIDIA?
Actually, the mentioned backtrace is not caught in the logs, only that nvidia-suspend.sh is killed. Makes it hard to find any cause. AFAIK, kernel 5.15 should be the 22.04 stock kernel, so you should be able to return to it running
sudo apt install --install-recommends linux-generic
Hy there!
I got the same problem with my RTX 2070 in my laptop.
My Conf, so far:
nvidia-smi
Sat May 13 10:22:46 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.116.04 Driver Version: 525.116.04 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:07:00.0 On | N/A |
| N/A 80C P0 114W / 115W | 1646MiB / 8192MiB | 99% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2184 G /usr/lib/xorg/Xorg 95MiB |
| 0 N/A N/A 2527 G /usr/bin/gnome-shell 4MiB |
| 0 N/A N/A 2815 G /usr/lib/xorg/Xorg 160MiB |
| 0 N/A N/A 2948 G /usr/bin/gnome-shell 53MiB |
| 0 N/A N/A 5798 G /usr/lib/firefox/firefox 198MiB |
| 0 N/A N/A 22083 G ...b/thunderbird/thunderbird 145MiB |
| 0 N/A N/A 22444 C ...le/amicable_OpenCL_v_3_02 982MiB |
+-----------------------------------------------------------------------------+
I read the dmesg ouptut and always found this after a bad suspsend-resume cycle:
[ 1741.504265] NVRM: GPU 0000:07:00.0: PreserveVideoMemoryAllocations module parameter is set. System Power Management attempted without driver procfs suspend interface. Please refer to the 'Configuring Power Management Support' section in the driver README.
[ 1741.504267] PM: pci_pm_suspend(): nv_pmops_suspend+0x0/0x30 [nvidia] returns -5
[ 1741.504496] PM: dpm_run_callback(): pci_pm_suspend+0x0/0x170 returns -5
[ 1741.504527] nvidia 0000:07:00.0: PM: failed to suspend: error -5
[ 1741.504530] PM: Some devices failed to suspend, or early wake event detected
Confusingly suspend, hibernate and resume.services are all loaded but “dead” according to systemctl status:
● nvidia-suspend.service - NVIDIA system suspend actions
Loaded: loaded (/lib/systemd/system/nvidia-suspend.service; enabled; vendor preset: enabled)
Active: inactive (dead)
● nvidia-hibernate.service - NVIDIA system hibernate actions
Loaded: loaded (/lib/systemd/system/nvidia-hibernate.service; enabled; vendor preset: enabled)
Active: inactive (dead)
● nvidia-resume.service - NVIDIA system resume actions
Loaded: loaded (/lib/systemd/system/nvidia-resume.service; enabled; vendor preset: enabled)
Active: inactive (dead)
Did all the above. But I do have kernel 5.13 in Ubuntu 20.04 LTS.
Any ideas already?
cheers