The GPU falls of the bus at a random time in a demanding application (Cyberpunk 2077). The only way to get rid of the black screen is to coldboot, and that’s not even guaranteed - sometimes multiple coldboots are needed to get the GPU up and running.
Hardware:
- CPU: AMD Ryzen 7 5800X 8-Core
- GPU: GeForce RTX 3070
- Motherboard: Asus Prime b450m-k ii
- OS Type: Pop!_os
Tried & failed methods:
- Different nvidia-drivers (nvidia-driver-560 & nvidia-driver-550-server via apt)
- Undervolted the GPU (180 W power limit & 1000-1800 MHz clock lock)
- Checked hardware for problems (No loose connection or cable)
Here is the relevant snippet of journalctl:
Oct 08 00:27:10 kotias kernel: snd_hda_intel 0000:09:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0011 address=0xda82de64 flags=0x0020]
Oct 08 00:27:10 kotias kernel: snd_hda_intel 0000:09:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0011 address=0xda82de00 flags=0x0020]
Oct 08 00:27:10 kotias kernel: NVRM: GPU at PCI:0000:09:00: GPU-efe74fdc-6c45-4e43-4220-47aee04ee805
Oct 08 00:27:10 kotias kernel: NVRM: Xid (PCI:0000:09:00): 79, pid=‘’, name=, GPU has fallen off the bus.
Oct 08 00:27:10 kotias kernel: NVRM: GPU 0000:09:00.0: GPU has fallen off the bus.
Oct 08 00:27:10 kotias kernel: snd_hda_intel 0000:09:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0011 address=0xda82de00 flags=0x0020]
Oct 08 00:27:10 kotias kernel: snd_hda_intel 0000:09:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0011 address=0xda82de00 flags=0x0020]
Oct 08 00:27:10 kotias kernel: snd_hda_intel 0000:09:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0011 address=0xda82de00 flags=0x0020]
Oct 08 00:27:10 kotias kernel: NVRM: A GPU crash dump has been created. If possible, please run NVRM: nvidia-bug-report.sh as root to collect this data before NVRM: the NVIDIA kernel module is unloaded.
Oct 08 00:27:10 kotias kernel: snd_hda_intel 0000:09:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0011 address=0xda82de00 flags=0x0020]
Oct 08 00:27:10 kotias /usr/libexec/gdm-x-session[10656]: (standard_in) 1: syntax error
Oct 08 00:27:10 kotias kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000c67e:6:0:0x0000000f