I’m seeing strong evidence of a fan hardware failure on my DGX Spark and am looking for confirmation from others who’ve seen this or from NVIDIA engineers before pursuing an RMA.
The smoking gun — anomalous idle temperature:
On 2026-05-11, I have 5-second telemetry showing the machine reached 80–86°C acpitz at 0% GPU utilization and 13W power draw — approximately 30°C above the normal idle baseline of 52°C at the same power level. This occurred before any GPU workload started.
| Time (UTC) | acpitz | GPU°C | GPU W | GPU % | Notes |
|---|---|---|---|---|---|
| 02:32–06:07 | ~52°C | ~40°C | 4-5W | 0% | Normal idle |
| 06:08–06:32 | 80-86°C | 52-58°C | 13W | 0% | Anomaly — no load |
| 06:59–08:33 | 93–94°C | 80–84°C | 83–91W | 96% | GPU workload started |
| Peak | 94.3°C | 84°C | 93W | 96% | 10.5°C below ACPI critical |
The machine hard-froze later that day at 19:21 UTC during a GPU run.
What I’ve already checked:
-
Fan nodes in OS: none —
/sys/class/hwmon/has nofan*_inputor PWM nodes.nvidia-smi --query-gpu=fan.speedreturns[N/A]. I understand fan control is firmware-only on GB10. -
Headless boot bug: not applicable — GDM is running, monitor attached.
-
acpid: active and enabled.
-
USB-C power budget: all external USB devices disconnected. Current idle is normal (42°C at 6.5W).
-
UEFI firmware:
5.36_0ACUM018(2025-08-06).
My question: Is the 80°C-at-idle pattern (30°C above baseline with zero GPU load) consistent with a fan hardware failure on the GB10? Has anyone else seen this? Is there any firmware-level diagnostic I’m missing before I conclude it’s a dead fan motor?
I’m attaching my telemetry CSV (dgx-flightrec.csv) — 5-second interval logging of mem, acpitz, GPU temp, GPU power, GPU util across the full day. - attached to Support Case below…
System serial: 1983925016936
Support case: #260511-000332