RTX4090 - GPU fans to max and "GPU has fallen off the bus"

kincaid.dave · July 7, 2023, 7:28pm

I have a System76 Mira R3 machine with an RTX4090 GPU. It has been working great for about 3 weeks since I got it, but all of a sudden a couple of days ago it started having problems. The GPU fan will spin up to what sounds like max speed and the GPU will stop responding. nvidia-smi just gives an error “Unable to determine the device handle for GPU0000:01:00.0: Unknown Error” and I see this in the system log:

[Fri Jul  7 14:06:55 2023] NVRM: GPU at PCI:0000:01:00: GPU-a8a22861-ba33-c1b6-e2f7-ec993989ad48
[Fri Jul  7 14:06:55 2023] NVRM: Xid (PCI:0000:01:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus.
[Fri Jul  7 14:06:55 2023] NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.

Everything else is still working while this is going on, but when I try to shutdown the system hangs.

I’m attaching the log output from nvidia-bug-report.sh.

This is a Pop OS 22.04 Linux system that I just bought about a month ago. I am using the 4090 for machine learning model training. There are no monitors connected to the 4090. I also have a GT1030 in the system that is driving a second monitor and it’s having no problems. I haven’t been able to track down anything that is triggering it to happen. It’s happened while I’m building a model, it’s happened when I’m just reading email and it’s even happened when I’m not doing anything and the system is idle. Is there anything I can do to fix this or is the GPU itself bad and need to be replaced?

nvidia-bug-report.log.gz (254.7 KB)

regunakyle · October 11, 2025, 11:47am

Hi, I have the exact same issue (but with a 5070 ti). Did you manage to find out the cause?

vanditd · October 17, 2025, 6:30am

Hi @kincaid.dave and @regunakyle ,
Thanks for reporting this issue.
Could you please help to attach a bug report with the latest r580 driver.

don.grote · November 12, 2025, 8:46pm

Hi @vanditd , this has also been happening to me but with dual RTX 5060 ti 16GB GPUs for the past month. It was working perfectly prior to mid-October, but even rolling back device drivers and kernels hasn’t resolved it.

Sometimes it happens immediately upon loading the nvidia drivers (via running nvidia-smi) from a cold boot, other times it’s when initiating a CUDA workload. It hasn’t happened when the GPU is in the first PCIe slot, it seems to only be happening to the GPU in the second slot. I’ve bought five RTX 5060 ti 16GB GPUs (1x Gigabyte, 4x PNY), and regardless of which GPU is in the second slot, it’s always the second slot that is falling off the bus. When any of the GPUs is by itself, it works as expected. I’ve tried re-seating the GPU many, many times; same behavior.

On some boots (very, very rare) the second GPU doesn’t even show up in the lspci listing.

I have replaced the original 750W PSU with an 850W PSU. I have installed the latest BIOS for the motherboard. The reason I have five GPUs is because I have two identical, dual-GPU builds for high availability; if one needs to go down, the VMs/Containers can be migrated to the backup. I’m experiencing the same behavior in both systems.

I’ve tried disabling ASPM in the BIOS and re-enabling it. I turned auto-negotiation of PCIe generation on and off. CSM is disabled. I’ve tried kernel command line parameters to no avail. I tried the new CUDA_DISABLE_PERF_BOOST environment variable, again, to no avail. I can’t configure power usage via nvidia-smi because the execution of that tool is most frequently the trigger for the GPU falling off the bus.

I’ve tried the NVIDIA drivers from 580.82.09 up to the latest 580.105.08; identical behavior. I’ve tried this on Debian 12 and 13 systems (specifically Proxmox 8.4 and 9.0) running Linux kernels 6.8.12 and 6.14.11, respectively.

I’m not sure how valuable the bug report will be, as the problematic GPU is not reachable for communication, but I’ve attached it regardless.

nvidia-bug-report.log.gz (457.6 KB)

Topic		Replies	Views
GPU (4090) falls off the bus, Linux desktop General Topics and Other SDKs ubuntu , cudnn	2	792	June 19, 2024
Please Help Another NVRM: GPU 0000:01:00.0: GPU has fallen off the bus. RTX4090 Linux	1	1307	August 18, 2023
Bug Report - 'GPU has fallen off the bus' randomly; NVIDIA GeForce RTX 4090 + NVIDIA GeForce RTX 5090 D dual setup Linux hw , ubuntu	6	291	March 26, 2026
GPU has fallen off the bus issues on daily basis (RTX 4090) Linux pcie , cuda , ubuntu , rtx	9	4255	April 26, 2025
79, GPU has fallen off the bus (RTX 2000) Linux rtx	6	277	August 18, 2025
Ubuntu 20.04 - RTX3090 - GPU has fallen off the bus Linux cuda , tensorflow , ubuntu , linux	6	4377	December 26, 2021
GPU has fallen off the bus GPU - Hardware	0	1014	October 25, 2019
GPU has fallen off the bus GPU - Hardware kernel , linux , rtx	1	921	May 21, 2024
GPU failed off the bus Linux	2	464	February 23, 2024
GPU has fallen off the bus Linux	0	374	August 20, 2024

RTX4090 - GPU fans to max and "GPU has fallen off the bus"

Related topics