Hello everyone,
We have a computer dedicated to Machine Learning and cracking hashes for cybersecurity, and we’re facing this issue were the GPUs stops working suddenly and in the logs we have this type of error:
- NVRM: Xid (PCI:0000:01:00): 79, pid=‘’, name=, GPU has fallen off the bus.
- NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
We have tried changing the kernel version, undervolting the gpus, changing the nvidia drivers and nothing works, the error happens randomly while under load.
After a reboot, the gpus works fine, but the error persists.
The specs:
- Ubuntu 22.04.4 LTS
- 5.15.0-118-generic
- 2x RTX 4090
- Intel i7-12700KF
- 32 GB RAM ddr5
- MB: Gigabyte Z690 AORUS ELITE AX
- PSU EVGA 1600W
nvidia-bug-report.log (3.2 MB)