RTX 4080 goes off bus once after cold start

I’m facing a weird behavior on my new system:

  • AMD Ryzen 9 7950X 16-Core
  • Gigabyte X670E Aorus Master
  • DDR5 Corsair Vengeance 5200 MHz 16 GB
  • PNY Nvidia GeForce RTX 4080
  • PSU Corsair 850W

The PC is two months old.
I have a dual boot with Ubuntu 23.04 and Windows 11.
In both OS I have serious issues the first time the system boots after a power off.

So, I have my PC off. I turn it on and boot either Linux or Windows. In Linux after 1-3 minutes from start, the whole GUI freezes forever, while the PC is still reachable via SSH. dmesg says:

GPU has fallen off the bus

I can only press the reset button (i.e. reboot via SSH is not executed).
After rebooting, I can work the whole day without any problem, with renderings or other application that uses a lot of GPU. But every time I turn off the PC and boot this error happens. Always.

I tried all the available driver in Ubuntu (I can’t insert two images…). Anyway:

  • nvidia-driver-525
  • nvidia-driver-525-open
  • nvidia-driver-525-server
  • xserver-xorg-video-nouveau

as well as the latest ones downloaded from the nvidia website. The behavior does not change.
I tried to use wayland or xorg.

In Windows, the issue is slightly different, but I guess has the same root cause.
When happens, the screens go black and after some minutes it reboots itself. I checked in the event viewer but I found as critical error only the notification that the system was restarted abruptly.

Here, the issues fires within few minutes after a cold start, but on next reboots, it still freezes, after longer time (i.e. half and hour, or even 1-2 hours).

In both cases it happens even staring at the desktop with no application running, with CPU and GPU close to 0%. This should exclude a PSU problem to me.

Here what I tried so far:

  • moved the RAM module into the other slot (i.e. from A2 to B2)
  • ran memtest86+ several times (10+ passes)
  • tried all the available drivers in Ubuntu
  • tried to switch between xorg and wayland
  • updated the system (APT in Linux or Windows Update)
  • reinstalled Ubuntu from scratch
  • reinstalled Windows from scrath
  • updated BIOS to the latest version
  • added pcie_aspm=off kernel option
  • removed the card from the slot and inserted again

Please, may you help me to fix this problem? The system is unusable in this way (especially in Windows).

Perhaps it’s a bit late but my guess is your psu does not output enough wattage. If my calculations are correct your system draws around 800w without taking into account any possible peripherals such as fans, leds etc…

NECROMANCY!!!

:DDDDD

1 Like