NVRM: GPU at PCI:0000:65:00: GPU-cd57429b-a4d9-917d-72d6-1d9b6c4f6a3a
NVRM: GPU Board Serial Number:
NVRM: Xid (PCI:0000:65:00): 79, GPU has fallen off the bus.
NVRM: GPU at 0000:65:00.0 has fallen off the bus.
NVRM: GPU is on Board .
NVRM: A GPU crash dump has been created. If possible, please run
NVRM: nvidia-bug-report.sh as root to collect this data before
NVRM: the NVIDIA kernel module is unloaded.
sched: RT throttling activated
NVRM: GPU at PCI:0000:65:00: GPU-cd57429b-a4d9-917d-72d6-1d9b6c4f6a3a
NVRM: GPU Board Serial Number:
NVRM: Xid (PCI:0000:65:00): 79, GPU has fallen off the bus.
NVRM: GPU at 0000:65:00.0 has fallen off the bus.
NVRM: GPU is on Board .
NVRM: A GPU crash dump has been created. If possible, please run
NVRM: nvidia-bug-report.sh as root to collect this data before
NVRM: the NVIDIA kernel module is unloaded.
Did you fix this issue, iam having this on 4.29 and latrst 5 kernels with all nvidia-drivers available on gentoo system. Strange is that when I try gpu_burn - whih is CUDA stresser, all is ok. The problem only occurs when I start X based stuff (xorg or plasma).
I have this exact same issue every day in Ubuntu 22.04 since I got a GeForce RTX 3060, with every kernel and every Nvidia video driver in the 5xx range. I have tried many different kernel/videodriver combinations and they all have the same problem.
The freeze usually occurs within 2 hours of booting the computer. I have tried multiple boot options I read around the net that solved the issue for others, such as the famous pcie_aspm=off, but they don’t make a difference in my case. That may be because I don’t have an ASUS mainboard like most people who report this issue. I have a Gigabyte X570 I Aorus Pro in stead.
While it sucks to a pretty infuriating level that this problem persists for such a long time over so many updates, I have discovered something interesting. The issue never reappears after a soft reboot.
So I do a Sync, Unmount and reBootSysRequest (i.e. hold Alt + SysRq while pressing S, U, and B in slow succession) and the problem is gone until I do a cold boot.
I know this topic is old, but it’s still the first result DuckDuckGo gives me. I thought I’d share that new information here.
10:06:20 kernel: [ 3901.114072] NVRM: GPU at PCI:0000:09:00: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
10:06:20 kernel: [ 3901.114079] NVRM: Xid (PCI:0000:09:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus.
10:06:20 kernel: [ 3901.114081] NVRM: GPU 0000:09:00.0: GPU has fallen off the bus.
10:06:20 kernel: [ 3901.114739] NVRM: Xid (PCI:0000:09:00): 32, pid=3502, name=cinnamon, Channel ID 00000010 intr 00800000
10:09:52 kernel: [ 4113.229473] sysrq: Emergency Sync
10:09:52 kernel: [ 4113.229691] Emergency Sync complete
10:09:54 kernel: [ 4114.232443] sysrq: Emergency Remount R/O
Rectification: This is false. Just lucky for some time. See this thread for more on this.
The safest way to reboot a frozen machine is still a Sync, Unmount and reBootSysRequest (i.e. hold Alt + SysRq while pressing S, U, and B) as long as the kernel is still running.