Screen artifacts and system crash - RTX A4000

On my linux box, I currently have a Quadro P4000 and it works perfectly.

I tried to upgrade my machine installing a RTX A4000 but I had no luck.

I installed the RTX A4000 and it worked flawlessly. No need to install a new driver or to change settings, it simply worked. After a few hours, the screen filled with artifacts and the system crashed.

Right now, the card crashes even more frequently, thus making the system is unusable, so I had to reinstall back the P4000.

I’m clueless: I’d like to understand if the problem is hardware related or software related

Some hints:

  • If I boot the system normally, the card crashes randomly, whatever task I’m doing
  • Apparently the crashes are not related to the card usage (I tried both 3d modeling and LLMs and it did not crashed)
  • nvidia-smi gives me temperatures around 37-50°C ad max
  • I tried to run gpu-burn in order to test the VRAM: the card did not crashed and the report was OK.
  • If I tried to boot in recovery mode, in console mode only, the card seems not to crash.
  • journalctl reports tons of xid errors 13 + other xid errors (check attached log)

My system:

OS: Ubuntu 24.04.1 LTS x86_64 
Kernel: 6.8.0-51-generic 
 Packages: 3009 (dpkg), 22 (flatpak), 
 Shell: zsh 5.9 
Resolution: 3840x2160 
DE: GNOME 46.0 
WM: Mutter 
WM Theme: Adwaita 
Theme: Adwaita [GTK2/3] 
Icons: Adwaita [GTK2/3] 
Terminal: kitty 
CPU: Intel Xeon E3-1240 v6 (8) @ 4.1 
GPU: NVIDIA Quadro P4000 
Memory: 4762MiB / 64157MiB

Current driver:

NVIDIA-SMI 550.120                
Driver Version: 550.120        
CUDA Version: 12.4

Here is the log:

nvidia-bug-report.log (623.2 KB)