GeForce RTX 3060 Error: GPU:0: Error while waiting for GPU progress: 0x0000c67d

Hello,

I am using Ubuntu on ASUS TUF Gaming F15 FX506HM Laptop.
Sometimes I can not shutdown my laptop and it is completely frozen. After a while on the black screen the error message from above is printed repeatedly.

nvidia-modeset: Error: GPU:0: Error while waiting for GPU progress: 0x0000c67d 2:0:4048:4040

Laptop can not be rebooted or shutdown, once it is in this state, I can only shutdown it by 5 seconds on power button.

Here is distribution info:
can@can-tuf-fx506hm:~$ sudo lsb_release -a
|Description:|Ubuntu 22.04.1 LTS|
|Release:|22.04|
|Codename:|jammy|

Here is the list of PCI devices:
can@can-tuf-fx506hm:~$ sudo lspci
0000:00:00.0 Host bridge: Intel Corporation 11th Gen Core Processor Host Bridge/DRAM Registers (rev 05)
0000:00:01.0 PCI bridge: Intel Corporation 11th Gen Core Processor PCIe Controller #1 (rev 05)
0000:00:02.0 VGA compatible controller: Intel Corporation TigerLake-H GT1 [UHD Graphics] (rev 01)
0000:00:04.0 Signal processing controller: Intel Corporation TigerLake-LP Dynamic Tuning Processor Participant (rev 05)
0000:00:06.0 System peripheral: Intel Corporation Device 09ab
0000:00:07.0 PCI bridge: Intel Corporation Tiger Lake-H Thunderbolt 4 PCI Express Root Port #0 (rev 05)
0000:00:08.0 System peripheral: Intel Corporation GNA Scoring Accelerator module (rev 05)
0000:00:0a.0 Signal processing controller: Intel Corporation Tigerlake Telemetry Aggregator Driver (rev 01)
0000:00:0d.0 USB controller: Intel Corporation Tiger Lake-H Thunderbolt 4 USB Controller (rev 05)
0000:00:0d.2 USB controller: Intel Corporation Tiger Lake-H Thunderbolt 4 NHI #0 (rev 05)
0000:00:0e.0 RAID bus controller: Intel Corporation Volume Management Device NVMe RAID Controller
0000:00:14.0 USB controller: Intel Corporation Tiger Lake-H USB 3.2 Gen 2x1 xHCI Host Controller (rev 11)
0000:00:14.2 RAM memory: Intel Corporation Tiger Lake-H Shared SRAM (rev 11)
0000:00:14.3 Network controller: Intel Corporation Tiger Lake PCH CNVi WiFi (rev 11)
0000:00:15.0 Serial bus controller: Intel Corporation Tiger Lake-H Serial IO I2C Controller #0 (rev 11)
0000:00:16.0 Communication controller: Intel Corporation Tiger Lake-H Management Engine Interface (rev 11)
0000:00:1d.0 PCI bridge: Intel Corporation Device 43b6 (rev 11)
0000:00:1f.0 ISA bridge: Intel Corporation Tiger Lake-H LPC/eSPI Controller (rev 11)
0000:00:1f.3 Audio device: Intel Corporation Tiger Lake-H HD Audio Controller (rev 11)
0000:00:1f.4 SMBus: Intel Corporation Tiger Lake-H SMBus Controller (rev 11)
0000:00:1f.5 Serial bus controller: Intel Corporation Tiger Lake-H SPI Controller (rev 11)
0000:01:00.0 VGA compatible controller: NVIDIA Corporation GA106M [GeForce RTX 3060 Mobile / Max-Q] (rev a1)
0000:01:00.1 Audio device: NVIDIA Corporation Device 228e (rev a1)
0000:2d:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
10000:e0:06.0 PCI bridge: Intel Corporation 11th Gen Core Processor PCIe Controller #0 (rev 05)
10000:e1:00.0 Non-Volatile memory controller: SK hynix Gold P31 SSD

Latest version of Nvidia Driver 515 is installed:

can@can-tuf-fx506hm:~$ sudo apt-get install nvidia-driver-515
nvidia-driver-515 is already the newest version (515.65.01-0ubuntu0.22.04.1).

Can you help me to fix this black screen / frozen laptop problem?

Best Regards,
Can

Please run nvidia-bug-report.sh as root after that happened and attach the resulting nvidia-bug-report.log.gz file to your post.

It happened today again, I waited about 5 minutes and then restarted laptop by 5 seconds on Power button.
Here is the results from the
sudo nvidia-bug-report.sh

nvidia-bug-report.log.gz (372.1 KB)

Aug 11 18:48:08 can-tuf-fx506hm kernel: NVRM: Xid (PCI:0000:01:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus.
Aug 11 18:48:08 can-tuf-fx506hm kernel: NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.

Since it’s a notebook, this looks like it’s beginning to die.
If still under warranty, try to reproduce it using Windows, then have it replaced by asus.

I have forgot to mention this error happens randomly, only when Ubuntu decides to go suspend or power saving / hibernate mode after some inactive time. Then sometimes, I can not wake up the Laptop anymore.
Can that be a driver issue related with Power + Nvidia Drivers under Linux?

I dropped to opensource driver and I did not have any issues with the opensource driver for a long while therefore I do not think this is an hardware issue.
I can also try to reproduce it with Windows, I did not installed Windows on it yet but I can try with one of old Windows 10 licenses…

I am using this Laptop only 5 about months since I buy it.
If it is beginning to die I would probably avoid any product created by Nvidia and Asus in future.
That combination worked for me more than 10 years on my previous laptop.

If it only happens after suspend, then it rather sounds like a bios issue. Please check for an update.