Hardware: NVIDIA GeForce RTX 5080 (16GB)
Driver Version: 595.71.05 (Open Kernel Module)
OS: Proxmox VE 9.1.11 (Kernel 7.0.2-3-pve / Debian 12 based)
Workload: LLM Inference (llama.cpp)
1. Problem Description
The GPU enters an unrecoverable state (Xid 62) during high-context LLM inference tasks. The failure is highly deterministic and specifically reproducible when using Multi-Token Prediction (MTP) model architectures (e.g., Qwen3.6-35B-MTP) at context windows $\ge$ 128k. Standard MoE and Dense models are perfectly stable, suggesting a logic bug in the GSP scheduler when handling the specific tensor scheduling required by MTP architectures on Blackwell.
2. Environment & Mitigations (Active during crash)
To rule out environmental or hardware defects, this system operates with strict constraints. The crash occurred despite these active mitigations:
- Power Capping: Active power limit set to 288 W (80% of 360 W TDP) via
nvidia-smi -pl 288to prevent transient voltage spikes. - Thermal Management: GPU temperature at the exact time of the crash was a very cool 52°C. Active cooling was running at 92% fan speed.
- Persistence Mode:
nvidia-persistencedwas actively running on the host. - Isolation: The workload was running with direct GPU passthrough; no concurrent GPU processes were active on the host or in other containers.
3. Error Logs (GSP-CrashCat)
The GSP firmware consistently fails in the exact same scheduler thread across multiple incidents.
- GSP Build ID:
6332050fe14352839619385982f83700835ef570 - GSP Task:
partition:4#0, task:3(RISC-V Scheduler) - Forensic Address:
GSP task watchdog timeout @ pc:0x1b3e200
Kernel Trace:
NVRM: Xid (PCI:0000:01:00): 62, 323f0f30 0000c3d8 00000000 206dae68 206da022 206da190 206d85f6 206d8d86
NVRM: GPU0 _kgspRpcGspEventPmuHalted: Received signal from GSP that PMU has halted.
NVRM: Xid (PCI:0000:01:00): 154, GPU recovery action changed from 0x0 (None) to 0x1 (GPU Reset Required)
NVRM: GPU0 kgspHealthCheck_TU102: ********** GSP-CrashCat Report ***********
NVRM: RISC-V CSR State: sstatus:0x0000000200000020 sscratch:0xffffffffa3013970 sie:0x0000000000000220 sip:0x0000000000000020
GSP task watchdog timeout @ pc:0x1b3e200, partition:4#0, task:3
4. Steps to Reproduce
- Load Qwen3.6-35B-A3B-MTP-GGUF (IQ2_M or Q2_K_XL).
- Set context size to 131,072.
- Set
flash-attn = onandfit = oninllama.cpp. - The GSP will reliably crash during the “fitting params to device memory” phase, even when VRAM usage is ~13.3 GB (leaving nearly 3 GB of physical headroom).
5. Technical Conclusion
The presence of strict power capping and stable, low thermals (52°C) at the time of failure strongly indicates that this is not a hardware overheating or power-delivery issue.
The deterministic nature of the crash (specific to MTP architectures and high context fitting) points to a firmware scheduler regression/deadlock in the 595.xx Blackwell GSP driver branch.
I have attached the nvidia-bug-report-2026-05-24.log.gz (768.3 KB) generated immediately after the Xid 62 event.