Dell DGX Spark BIOS dated 09/23/2025 hangs on long-context vLLM inference

Summary

A Dell-branded DGX Spark stuck on BIOS 5.36_2.1.0 dated 2025-09-23
deterministically hangs the entire box during prefill of large-context
(~200k token) vLLM requests. Three other Dell DGX Sparks bought a few
months later, with the same BIOS version string but build date
2025-10-08, run the identical workload cleanly. Dell’s LVFS channel
(fwupdmgr get-updates) reports no updates available for the 09/23
unit, so the apparent fix is unreachable through documented channels.

Hardware

  • 4× Dell DGX Spark (SK Hynix 128 GB unified memory, GB10)
  • All on Ubuntu 24.04, kernel 6.17.0-1014-nvidia
  • All on NVIDIA driver 580.142
  • All on dgx-release 7.5.0, dgx-spark-ota-update-meta 26.04.1,
    nvidia-firmware-580-580.142

The split

Box BIOS version BIOS date 200k vLLM test
aibox01 5.36_2.1.0 2025-09-23 freezes box
aibox02 5.36_2.1.0 2025-09-23 (untested — runs llama.cpp)
aibox03 5.36_2.1.0 2025-10-08 passes (358 s)
aibox04 5.36_2.1.0 2025-10-08 passes (371 s)
aibox05 5.36_2.1.0 2025-10-08 (untested — runs llama.cpp)

fwupdmgr get-updates on the 09/23 unit:

Devices with the latest available firmware version:
 • Embedded Controller
 • TPM
 • UEFI Device Firmware
 • UEFI Device Firmware
No updates available

So the 10/08 BIOS appears to have shipped factory-flashed on later
units only and is not (yet?) offered to older serial numbers via LVFS
or via the DGX Dashboard’s update UI.

Reproducer

A single chat completion request with ~200k input tokens against a
vLLM container running Qwen/Qwen3.6-27B-int4-AutoRound with MTP
speculative decoding. Service launch:

vllm serve Lorbus/Qwen3.6-27B-int4-AutoRound \
    --tensor-parallel-size 1 \
    --max-model-len 262144 \
    --gpu-memory-utilization 0.75 \
    --attention-backend flashinfer \
    --max-num-seqs 2 \
    --kv-cache-dtype fp8_e4m3 \
    --quantization auto_round \
    --enable-prefix-caching --enable-chunked-prefill \
    --speculative-config '{"method":"mtp","num_speculative_tokens":3}' \
    --reasoning-parser qwen3

Driver: a POST /v1/chat/completions with one user message of about
197 000 tokens of filler text. Reproduces in 4/4 attempts on aibox01
(09/23 BIOS); 0/2 on aibox04 (10/08 BIOS, same model, same vLLM).

What the box does when it hangs

  • TCP connection accepted, no response chunk ever returned

  • nvidia-smi blocks indefinitely

  • ping stops answering after a few minutes

  • Only recovery is a hard power cycle

  • A previous incident (an earlier date, see kernel log below) showed
    the signature in dmesg before the network died:

    NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory
    [NV_ERR_NO_MEMORY] (0x00000051) returned from
    _memdescAllocInternal(pMemDesc) @ mem_desc.c:1359
    

    Same signature has been documented elsewhere for vLLM on Spark under
    load (vllm/vllm#40756, NVIDIA/open-gpu-kernel-modules#968). What is
    novel here: it’s reproducible on a single request on the older BIOS
    and does not reproduce on the newer BIOS.

Three asks

  1. Is there a known BIOS regression in 5.36_2.1.0 (09/23 build) that
    was silently fixed in the 10/08 rebuild? Release notes for the
    intermediate firmware would help us confirm.
  2. Why does fwupdmgr report “no updates available” on the older unit
    when a newer build of the same version string clearly exists on
    factory-fresh hardware? Is the update queued in a Dell channel that
    hasn’t propagated, or is it shipped only as a factory image?
  3. What is the supported path for a Dell DGX Spark owner to obtain the
    newer firmware build short of opening a Dell support ticket?

Happy to provide kernel logs, full dmidecode output, or any other
diagnostics that would help.