Regression: nvidia-modeset Kernel Panic & kwin_wayland Crash on 5060 Ti (Blackwell) under High VRAM Load

Hello,

I am adding my report for a hard system freeze and kernel panic on Wayland. This appears to be a bug regression related to high VRAM usage, and it seems specific to the new 50-series (Blackwell) architecture.

The Problem

The system runs perfectly under light loads. However, when running any application that causes high VRAM usage (like AI workloads or games like Cities: Skylines II), the entire system freezes, all displays lose signal, and I am forced to hard-reboot.

The crash is 100% reproducible under these high-load conditions.

Hardware & Software

  • GPU: NVIDIA GeForce 5060 Ti 16G

  • Display: Samsung Odyssey G7 (connected via DisplayPort)

  • OS 1 (Crash Confirmed): Bazzite (Fedora 43)

    • Driver: 580.95.05
  • OS 2 (Crash Confirmed): CachyOS

    • Driver: 580.105.08

Key Finding: 3070 vs. 5060 Ti

This is the most important detail: I never experienced this bug on my previous 3070 (Ampere) card. The crashes only began immediately after I upgraded to the 5060 Ti.

The fact that the same crash occurs on two different distributions (Bazzite, CachyOS) and two different recent driver versions (580.95.05, 580.105.08) strongly points to a driver-level bug specific to the new Blackwell architecture.

How to Replicate

The bug is consistently triggered by VRAM pressure.

  • Fails (Hard Crash):

    1. Running any local AI workload that maxes out VRAM.

    2. Loading into the main menu of Cities: Skylines II (which is known to have very high VRAM usage).

  • Works Perfectly (No Crash):

    1. Running lighter games like Farming Simulator 25.

    2. All normal desktop usage.

Log Evidence (The “Smoking Gun”)

My journalctl logs show the exact moment of failure. The NVIDIA kernel driver panics, which then causes a cascade failure that kills kwin_wayland and the rest of the graphics stack.

The crash sequence is:

  1. kwin_wayland logs Create Context failed "EGL_BAD_CONTEXT".

  2. The kernel is flooded with [drm:__nv_drm_semsurf_wait_fence_work_cb [nvidia_drm]] *ERROR*.

  3. The kernel is then spammed with nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state and The requested configuration of display devices... is not supported.

  4. kwin_wayland, Xwayland, and any open graphical apps all coredump.

This appears to be the same nv_drm_atomic_commit failure that was discussed in this thread (Post #1) and reported by other users (Post #31), which was supposedly fixed in 580.65.06 but has clearly regressed or was never fully fixed for new architectures. See 580 release feedback & discussion

I have attached my full journalctl -b -1 output from the Bazzite crash.

(Attach your log.txt file here)

log.txt (250.0 KB)

nvidia-bug-report.log.gz (505.3 KB)