RTX 4090 - Pageflip timed out | 580.95.05

My PC blackscreens somewhat regularly, usually while gaming. When this happens, the GPU fans rev up to 100% and a hard reset is required to restart the system.

Here is the basic system information:

Output from nvidia-smi:

Output from journalctl -k -b -1:

Full Output from journalctl -b -1 (starting from when I opened the game):

The output of journalctl –user-unit plasma-kwin_wayland shows nothing of interest at the time the GPU crashed, there are no entries anywhere near the timestamp.

Additional information:
I have two monitors, an AOC 25G3ZM set to 120Hz and an AOC Q27G3XMN (primary) set to 180Hz with HDR enabled. Brightness is controlled with DDC/CI and is set to different levels for each monitor.

A Valve Index Headset is connected to the GPU, but is disabled by default.

Both monitors blackscreen, audio keeps playing.

There are no direct steps for reproduction of this error that I figured out just yet. It seems to happen intermittently and without warning.

If there’s any more information required, I’d be happy to assist.

Getting the same issue here. And can no longer get past the windows logon screen either or sddm.

I could actually resolve my issue - setting pcie_aspm=off in the grub boot options (GRUB_CMDLINE_LINUX_DEFAULT) and reseating the wiring one more time did actually help.

I strongly suspect pcie_aspm is the culprit here. After setting this boot option, the crashes have become very very rare. Some point after that, I replaced my 1000W Seasonic PSU with a 1200W SAMA PSU, that had a direct 12VHPWR connection, rather than the PCIe → HPWR adapter that was required with the Seasonic. Since then, I haven’t had a crash related to this issue.

When trying to resolve this error, I also set some other boot options which I still haven’t removed, so for completeness’ sake (maybe it helps someone), here’s the full set (though I assume anything aside from pcie_aspm is not required/redundant): nvidia_drm.modset=1 nvidia_drm.fbdev=1 nvidia.NVreg_EnableGpuFirmware=0 pcie_aspm=off

If you can, see if you can hook your PC up to a power supply with a higher rating. Since this was only an issue on Linux for me and never Windows, I suspect something is different between how the two handle power budgeting. I’ve no clue about this stuff, but I can imagine that linux may allow for higher transient spikes which then caused the GPU to become unstable and go into reset.

Hope this helps!

I’ve got a Corsair hx1000i
But, I suspect a bad cable now also as I decided to resurrect my titanxp.
I had to switch which cables I used because I get no GPU detected with one set

I recall there being issues with 30XX series cards and some power supplies, where the cards could trip the over-current protection of the PSUs. While this probably isn’t the case here (since usually the PC would just shut down without any errors in the log), checking the PSU out might still be worth a shot, depending on how easily you can access a new one in the same power range or above.

Investing in a proper 12VHPWR cable that directly connects to the PSU is probably also worth it.