GeForce RTX 3060 Laptop freezes when powering off and suspending on Linux

Hi all,
I have a laptop, a Gigabyte Aero 15 KD, with a GeForce RTX 3060 graphics card. This card has been giving me problems when using multiple Linux distros (PopOS, Ubuntu, Debian).
When I let the computer turn off the screen, sometimes the screen doesn’t turn back on. In this situation, I can access the computer via SSH, and running “nvidia-smi” the output is “Unable to determine the device handle for gpu for GPU0000:01:00.0: Unknouwn error after blank screen”. I also executed the nvidia-bug-report.sh and the log is attached to this question (log “Mar 01 22:00:37 pop-os systemd[1]: nvidia-resume.service: Deactivated successfully.” - After that nvidia-modeset errors). When turning off the device sometimes it freezes, but I believe that the problem is the same.
I had, however, a setup that didn’t give me any issues using Pop!_OS 20.04 + kernel 6.0.12-76060012-generic and Nvidia driver 535.86.05.

My current setup is:

  • Distro: Pop!_OS 22.04 LTS
  • Kernel: 6.6.10-76060610-generic
  • Nvidia driver: 545.29.06
    nvidia-bug-report.log (6.6 MB)

The nvidia gpu fails to power on again. Please attach an acpidump output.

Sorry for the late response… I have been trying multiple driver versions with the hope of temporary solve the problem without success. I formatted my laptop again and got it in the same setup mentioned earlier. Surprisingly, it took a while, but now it is giving me the same issue as before.
Here is the acpidump output before the issue:
acpidump_before.txt (2.4 MB)
Here is the nvidia-bug-report after:
nvidia-bug-report.log (10.2 MB)
Here is the acpidump output after the issue:
acpidump_after.txt (2.4 MB)

The gpu not turning on after sleep is a kernel issue so changing driver versions won’t help. So you woul need to return to an earlier kernel.

Ok, I will try that. Thanks!

To see what happens I returned to the most stable setup that I got so far (Pop!_OS 20.04 + kernel 6.0.12-76060012-generic and Nvidia driver 535.86.05). Turns out this also gives-me issues, but with much lower frequency. Nevertheless I started by trying multiple nvidia driver versions (I’m a beginner and it is easier to me to switch nvidia drivers than kernels on this distro) - 525, 535, 545, and 550 and the behavior was indeed similar across this versions. I then switched the kernel version and the lower the kernel version, the most frequent the issues appear. I assumed that it is indeed a kernel issue and then I installed the manjaro distro with a more up-to-date kernel and it is the most stable that I got so far. Do you know what knowledge should I get to understand what is the problem, in order to report this more accurately?

Difficult, the kernel doesn’t even seem to notice the gpu didn’t power on again. Since you say “more reliable” and “less reliable”, does this not always happen? Sometimes the gpu is on, sometimes not? Please create a new nvidia-bug-report.log in the working state, right after a fresh boot.

It did happen always when using Pop!_OS 22.04 LTS (kernel 6.6.10-76060610-generic). But with Pop!_OS 20.04 + kernel 6.0.12-76060012-generic it didn’t always happen. I noticed that I wasn’t using the GPU for the display (nvidia-smi only had one Xorg process) with Pop!_OS 20.04 and using it with Pop!_OS 22.04 (multiple processes: one for each graphical application), which can be related to the problem.

Using manjaro with the kernel 6.6.19-1-MANJARO (and only one process appears in the nvidia-smi), until now it never happened when I put the computer in sleep mode, even using the graphics card to run PyTorch models. The problem now appears when I plug an external screen into the HDMI port: after a short amount of time the second screen freezes and nvidia-smi give the “Unable to determine the device handle for gpu for GPU0000:01:00.0: Unknown error” error. After that, when turning off the computer, the computer freezes. Here is the nvidia-bug-report before turning off the computer (nvidia-smi error): nvidia-bug-report.log (865.0 KB)

Sometimes, randomly (without plugging any external screen and using or not using PyTorch models), the computer also freezes, but I have no way (at least from what I know) of knowing if it is a graphics card issue.

Here is the nvidia-bug-report for a fresh boot: fresh_boot.log (1.2 MB)

On endeavourOS after reinstalling the distro few days i also am having similar issues, but as a black screen, after waking from suspend,still while not entirely sure if its nvidia, but on suspend it had happened already the other day(REISUB worked) and now today (REISUB did not work, system freeze).

Here is the nvidia-bug-report.sh file, hope it helps:
nvidia-bug-report.log.gz (952.6 KB)

And my system info:
EndeavourOS
DE: KDE 6.0.2
Lenovo Legion 5 15ach6h
AMD ryzen5 5600h
Nvidia rtx3060 laptop gpu
32Gb ram.

Other info:

Drivers installed through nvidia-inst package of endeavourOS.

For scripts i have only powerd.service enabled, all suspend, hibernate and resume disabled, but /lib/systemd/system-sleep/nvidia is still there, not sure if i should remove or not.