GPU has fallen off the bus

NVRM: GPU at PCI:0000:65:00: GPU-cd57429b-a4d9-917d-72d6-1d9b6c4f6a3a
NVRM: GPU Board Serial Number:
NVRM: Xid (PCI:0000:65:00): 79, GPU has fallen off the bus.
NVRM: GPU at 0000:65:00.0 has fallen off the bus.
NVRM: GPU is on Board .
NVRM: A GPU crash dump has been created. If possible, please run
NVRM: nvidia-bug-report.sh as root to collect this data before
NVRM: the NVIDIA kernel module is unloaded.
sched: RT throttling activated

Then Linux crashed.

$ lspci -vv | grep -w -A2 NVIDIA
17:00.0 VGA compatible controller: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: ZOTAC International (MCO) Ltd. GP102 [GeForce GTX 1080 Ti]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
--
17:00.1 Audio device: NVIDIA Corporation GP102 HDMI Audio Controller (rev a1)
	Subsystem: ZOTAC International (MCO) Ltd. GP102 HDMI Audio Controller
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
--
18:00.0 VGA compatible controller: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: ZOTAC International (MCO) Ltd. GP102 [GeForce GTX 1080 Ti]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
--
18:00.1 Audio device: NVIDIA Corporation GP102 HDMI Audio Controller (rev a1)
	Subsystem: ZOTAC International (MCO) Ltd. GP102 HDMI Audio Controller
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
--
65:00.0 VGA compatible controller: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: ZOTAC International (MCO) Ltd. GP102 [GeForce GTX 1080 Ti]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
--
65:00.1 Audio device: NVIDIA Corporation GP102 HDMI Audio Controller (rev a1)
	Subsystem: ZOTAC International (MCO) Ltd. GP102 HDMI Audio Controller
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
--
b4:00.0 VGA compatible controller: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: ZOTAC International (MCO) Ltd. GP102 [GeForce GTX 1080 Ti]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
--
b4:00.1 Audio device: NVIDIA Corporation GP102 HDMI Audio Controller (rev a1)
	Subsystem: ZOTAC International (MCO) Ltd. GP102 HDMI Audio Controller
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-

nvidia-bug-report.log.gz (276 KB)

XID 79 points to overheating or insufficient/unstable power supply

I had another one.

NVRM: GPU at PCI:0000:65:00: GPU-cd57429b-a4d9-917d-72d6-1d9b6c4f6a3a
NVRM: GPU Board Serial Number: 
NVRM: Xid (PCI:0000:65:00): 79, GPU has fallen off the bus.
NVRM: GPU at 0000:65:00.0 has fallen off the bus.
NVRM: GPU is on Board .
NVRM: A GPU crash dump has been created. If possible, please run
NVRM: nvidia-bug-report.sh as root to collect this data before
NVRM: the NVIDIA kernel module is unloaded.

Full log: https://gist.github.com/kenorb/9b41910fbced376314b7dda50ccad2cd

I will check the settings in BIOS next time.

The following post suggests it’s the issue with ASUS motherboards:

It’s suggested to change the kernel option to:

pcie_aspm=off

. I’ll try that as well.

1 Like

Screenshots from NVIDIA X Server Settings app of the failing GPU (1st of 4):



Did you fix this issue, iam having this on 4.29 and latrst 5 kernels with all nvidia-drivers available on gentoo system. Strange is that when I try gpu_burn - whih is CUDA stresser, all is ok. The problem only occurs when I start X based stuff (xorg or plasma).