[BUG Report]

Hi,

I’m experiencing full system hangs when running demanding titles via Steam/Proton on my RTX PRO 6000 Blackwell Max-Q. This has happened across multiple games including Hogwarts Legacy (DX12), Cyberpunk 2077 (DX12), Total War: Warhammer III (DX11/Vulkan), and other GPU-intensive titles. The symptom is always the same: the display freezes completely while audio continues playing. The Wayland compositor becomes unresponsive, TTY switching is blocked, and only a hard reboot recovers the system.

Looking at the kernel log from the previous boot, the crash follows a pattern of Xid 32 errors during gameplay, eventually culminating in a fatal Xid 56 followed by a DRM flip event timeout:

[13132.342106] NVRM: Xid (PCI:0000:01:00): 32, pid=27514, name=HogwartsLegacy., channel 0x00000014 intr0 00040000
[13132.355013] NVRM: Xid (PCI:0000:01:00): 32, pid=27514, name=HogwartsLegacy., channel 0x00000014 intr0 00040000
[13641.741034] NVRM: Xid (PCI:0000:01:00): 32, pid=30738, name=HogwartsLegacy., channel 0x00000014 intr0 00040000
[13641.753002] NVRM: Xid (PCI:0000:01:00): 32, pid=30738, name=HogwartsLegacy., channel 0x00000014 intr0 00040000
[14901.195861] NVRM: Xid (PCI:0000:01:00): 56, CMDre 00000007 00000000 00000000 00000001 00000000
[14920.411036] [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 0

The Xid 32 errors occur in pairs during gameplay across two separate game launches. About 21 minutes after the last Xid 32, the Xid 56 fires, and 19 seconds later the DRM flip timeout locks up all display output.

How to reproduce:

  1. Boot with nvidia_drm.modeset=1 nvidia_drm.fbdev=1 pcie_aspm=off
  2. Start a Wayland session (niri compositor) on a 4K 240 Hz display via DisplayPort
  3. Launch any demanding Proton title (Hogwarts Legacy, Cyberpunk 2077, Total War: Warhammer III, etc.)
  4. Play for some minutes until the screen freezes
  5. System is fully unresponsive — no TTY, no recovery without hard reboot

The kernel log above was captured during a Hogwarts Legacy session, but the same crash pattern has occurred with Cyberpunk 2077, Total War: Warhammer III, and other demanding titles. It does not appear to be game-specific — rather it seems to affect any GPU-intensive Proton workload.

One thing to note: during the first game launch in this particular session, a vLLM inference server was consuming ~87 GB of the 96 GB VRAM, so the initial Xid 32 errors may be related to memory pressure. However, the server was stopped before the final crash, leaving ~95 GB free. The fatal Xid 56 occurred with plenty of available VRAM. The crashes with other titles listed above happened without any VRAM contention.

System details:

  • GPU: NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition (GB202GL, rev a1)
  • VBIOS: 98.02.6A.00.03
  • Driver: 595.58.03 (Open Kernel Module)
  • CUDA: 13.2
  • OS: Arch Linux (rolling)
  • Kernel: 6.19.10-1-cachyos (PREEMPT_DYNAMIC, clang 22.1.1)
  • CPU: AMD Ryzen 7 9800X3D
  • RAM: 32 GB
  • Display: ASUS ROG Swift PG32UCDMR, 3840x2160 @ 240 Hz, DisplayPort, scale 2.0
  • Compositor: niri (Wayland, no X server)
  • Proton: tested with proton-cachyos 10.0-20260324 and GE-Proton10-34 (Wine 10.0 Staging)
  • Affected games: Hogwarts Legacy (DX12), Cyberpunk 2077 (DX12), Total War: Warhammer III (DX11/Vulkan), and other demanding titles — all via VKD3D-Proton/DXVK

The nvidia-bug-report.log.gz was generated after reboot since the crash required a hard power cycle. It is attached below.

Thanks for looking into this.

nvidia-bug-report.log.gz (752.9 KB)

One more thing to note is that I was using the same setup on 590 and didn’t experience any similar crashes.

I’ve seen similar issues on a 5090 on 4k 240hz. I switched my display to 144hz and it is much more stable now. Let me know if it helps you. Obviously not an ideal workaround.

I changed the refresh rate on the same monitor down to 144hz and no related crashes (Xid 32/56) atleast so far.

El El sáb, 4 abr. 2026 a la(s) 7:19 p.m., JMro <notifications@nvidia.discoursemail.com> escribió:

The bug report log identifies the trigger.

All NVIDIA system services are disabled on this system:

nvidia-suspend.service    — disabled
nvidia-hibernate.service  — disabled
nvidia-resume.service     — disabled
nvidia-powerd.service     — disabled
nvidia-persistenced.service — disabled

The system entered S3 sleep approximately one hour after boot. With nvidia-suspend and nvidia-resume disabled, GPU state was not preserved through S3. The driver continued after resume but the GPU’s internal channel state was left in an undefined condition.

Six hours later, under VKD3D-Proton load, channel 0x00000014 produced Xid 32 errors across two separate game sessions:

18:39:51 — Xid 32, pid 27514, channel 0x00000014
18:48:20 — Xid 32, pid 30738, channel 0x00000014

Twenty-one minutes after the last Xid 32, the display engine failed:

19:09:19 — Xid 56, CMDre 00000007
19:09:39 — DRM flip timeout on head 0

The fix is to enable the NVIDIA suspend/resume services:

sudo systemctl enable nvidia-suspend.service
sudo systemctl enable nvidia-resume.service
sudo systemctl enable nvidia-hibernate.service

Or disable S3 sleep entirely if suspend is not required.

Regarding the 595 vs 590 difference noted in post #2 — this cannot be confirmed from this log alone. A log from a 590 session that also went through S3 would be needed to compare.