On my linux box, I currently have a Quadro P4000 and it works perfectly.
I tried to upgrade my machine installing a RTX A4000 but I had no luck.
I installed the RTX A4000 and it worked flawlessly. No need to install a new driver or to change settings, it simply worked. After a few hours, the screen filled with artifacts and the system crashed.
Right now, the card crashes even more frequently, thus making the system is unusable, so I had to reinstall back the P4000.
I’m clueless: I’d like to understand if the problem is hardware related or software related
Some hints:
- If I boot the system normally, the card crashes randomly, whatever task I’m doing
- Apparently the crashes are not related to the card usage (I tried both 3d modeling and LLMs and it did not crashed)
- nvidia-smi gives me temperatures around 37-50°C ad max
- I tried to run gpu-burn in order to test the VRAM: the card did not crashed and the report was OK.
- If I tried to boot in recovery mode, in console mode only, the card seems not to crash.
- journalctl reports tons of xid errors 13 + other xid errors (check attached log)
My system:
OS: Ubuntu 24.04.1 LTS x86_64
Kernel: 6.8.0-51-generic
Packages: 3009 (dpkg), 22 (flatpak),
Shell: zsh 5.9
Resolution: 3840x2160
DE: GNOME 46.0
WM: Mutter
WM Theme: Adwaita
Theme: Adwaita [GTK2/3]
Icons: Adwaita [GTK2/3]
Terminal: kitty
CPU: Intel Xeon E3-1240 v6 (8) @ 4.1
GPU: NVIDIA Quadro P4000
Memory: 4762MiB / 64157MiB
Current driver:
NVIDIA-SMI 550.120
Driver Version: 550.120
CUDA Version: 12.4
Here is the log:
nvidia-bug-report.log (623.2 KB)