Summary
A Dell-branded DGX Spark stuck on BIOS 5.36_2.1.0 dated 2025-09-23
deterministically hangs the entire box during prefill of large-context
(~200k token) vLLM requests. Three other Dell DGX Sparks bought a few
months later, with the same BIOS version string but build date
2025-10-08, run the identical workload cleanly. Dell’s LVFS channel
(fwupdmgr get-updates) reports no updates available for the 09/23
unit, so the apparent fix is unreachable through documented channels.
Hardware
- 4× Dell DGX Spark (SK Hynix 128 GB unified memory, GB10)
- All on Ubuntu 24.04, kernel
6.17.0-1014-nvidia - All on NVIDIA driver
580.142 - All on
dgx-release 7.5.0,dgx-spark-ota-update-meta 26.04.1,
nvidia-firmware-580-580.142
The split
| Box | BIOS version | BIOS date | 200k vLLM test |
|---|---|---|---|
| aibox01 | 5.36_2.1.0 | 2025-09-23 | freezes box |
| aibox02 | 5.36_2.1.0 | 2025-09-23 | (untested — runs llama.cpp) |
| aibox03 | 5.36_2.1.0 | 2025-10-08 | passes (358 s) |
| aibox04 | 5.36_2.1.0 | 2025-10-08 | passes (371 s) |
| aibox05 | 5.36_2.1.0 | 2025-10-08 | (untested — runs llama.cpp) |
fwupdmgr get-updates on the 09/23 unit:
Devices with the latest available firmware version:
• Embedded Controller
• TPM
• UEFI Device Firmware
• UEFI Device Firmware
No updates available
So the 10/08 BIOS appears to have shipped factory-flashed on later
units only and is not (yet?) offered to older serial numbers via LVFS
or via the DGX Dashboard’s update UI.
Reproducer
A single chat completion request with ~200k input tokens against a
vLLM container running Qwen/Qwen3.6-27B-int4-AutoRound with MTP
speculative decoding. Service launch:
vllm serve Lorbus/Qwen3.6-27B-int4-AutoRound \
--tensor-parallel-size 1 \
--max-model-len 262144 \
--gpu-memory-utilization 0.75 \
--attention-backend flashinfer \
--max-num-seqs 2 \
--kv-cache-dtype fp8_e4m3 \
--quantization auto_round \
--enable-prefix-caching --enable-chunked-prefill \
--speculative-config '{"method":"mtp","num_speculative_tokens":3}' \
--reasoning-parser qwen3
Driver: a POST /v1/chat/completions with one user message of about
197 000 tokens of filler text. Reproduces in 4/4 attempts on aibox01
(09/23 BIOS); 0/2 on aibox04 (10/08 BIOS, same model, same vLLM).
What the box does when it hangs
-
TCP connection accepted, no response chunk ever returned
-
nvidia-smiblocks indefinitely -
pingstops answering after a few minutes -
Only recovery is a hard power cycle
-
A previous incident (an earlier date, see kernel log below) showed
the signature in dmesg before the network died:NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from _memdescAllocInternal(pMemDesc) @ mem_desc.c:1359Same signature has been documented elsewhere for vLLM on Spark under
load (vllm/vllm#40756, NVIDIA/open-gpu-kernel-modules#968). What is
novel here: it’s reproducible on a single request on the older BIOS
and does not reproduce on the newer BIOS.
Three asks
- Is there a known BIOS regression in
5.36_2.1.0(09/23 build) that
was silently fixed in the 10/08 rebuild? Release notes for the
intermediate firmware would help us confirm. - Why does
fwupdmgrreport “no updates available” on the older unit
when a newer build of the same version string clearly exists on
factory-fresh hardware? Is the update queued in a Dell channel that
hasn’t propagated, or is it shipped only as a factory image? - What is the supported path for a Dell DGX Spark owner to obtain the
newer firmware build short of opening a Dell support ticket?
Happy to provide kernel logs, full dmidecode output, or any other
diagnostics that would help.