I’m facing a weird behavior on my new system:
- AMD Ryzen 9 7950X 16-Core
- Gigabyte X670E Aorus Master
- DDR5 Corsair Vengeance 5200 MHz 16 GB
- PNY Nvidia GeForce RTX 4080
- PSU Corsair 850W
The PC is two months old.
I have a dual boot with Ubuntu 23.04 and Windows 11.
In both OS I have serious issues the first time the system boots after a power off.
So, I have my PC off. I turn it on and boot either Linux or Windows. In Linux after 1-3 minutes from start, the whole GUI freezes forever, while the PC is still reachable via SSH. dmesg
says:
GPU has fallen off the bus
I can only press the reset button (i.e. reboot
via SSH is not executed).
After rebooting, I can work the whole day without any problem, with renderings or other application that uses a lot of GPU. But every time I turn off the PC and boot this error happens. Always.
I tried all the available driver in Ubuntu (I can’t insert two images…). Anyway:
- nvidia-driver-525
- nvidia-driver-525-open
- nvidia-driver-525-server
- xserver-xorg-video-nouveau
as well as the latest ones downloaded from the nvidia website. The behavior does not change.
I tried to use wayland or xorg.
In Windows, the issue is slightly different, but I guess has the same root cause.
When happens, the screens go black and after some minutes it reboots itself. I checked in the event viewer but I found as critical error only the notification that the system was restarted abruptly.
Here, the issues fires within few minutes after a cold start, but on next reboots, it still freezes, after longer time (i.e. half and hour, or even 1-2 hours).
In both cases it happens even staring at the desktop with no application running, with CPU and GPU close to 0%. This should exclude a PSU problem to me.
Here what I tried so far:
- moved the RAM module into the other slot (i.e. from A2 to B2)
- ran memtest86+ several times (10+ passes)
- tried all the available drivers in Ubuntu
- tried to switch between xorg and wayland
- updated the system (APT in Linux or Windows Update)
- reinstalled Ubuntu from scratch
- reinstalled Windows from scrath
- updated BIOS to the latest version
- added
pcie_aspm=off
kernel option - removed the card from the slot and inserted again
Please, may you help me to fix this problem? The system is unusable in this way (especially in Windows).