Title: ASUS Ascent GX10 (GB10) hard power-off / unclean reboot under vLLM (gpt-oss-120b, long context)

I’m seeing repeated hard power-offs / resets on an ASUS Ascent GX10 (host gx10-4323, GB10 / Blackwell) while running heavy inference in a Docker vLLM container. This does not look like GPU thermal shutdown (I often see ~85–90W and ~60–70°C right before the disconnect), but rather an abrupt power cut / firmware reset (“unclean shutdown”).

System info:

  • System: ASUS GX10 / DGX Spark (arm64)
    Hostname: gx10-4323
    DGX release package: dgx-release 7.4.0
    /etc/dgx-release: SWBUILD 7.2.3; OTA entries include 7.3.1 and 7.4.0 (2026-02-05 20:12:54 +05)
    OS: Ubuntu 24.04.3 LTS
    Kernel (runtime): 6.14.0-1015-nvidia
    Firmware: GX10DGX.0102.2025.1111.1531 (2025-11-11)

    NVIDIA driver (nvidia-smi): 580.126.09 (CUDA Version reported by driver: 13.0)
    NVRM (/proc/driver/nvidia/version): 580.126.09
    modinfo nvidia: version 580.126.09; vermagic 6.14.0-1015-nvidia

    CUDA toolkit:

    • cuda-toolkit-13-0 13.0.2-1
    • cuda-nvcc-13-0 13.0.88-1
      nvcc --version: Cuda compilation tools, release 13.0, V13.0.88

    NVIDIA Container Toolkit:

    • nvidia-container-toolkit 1.18.2-1 (libnvidia-container1 1.18.2-1)

    Docker Engine:

    • docker-ce 29.1.3 (docker compose plugin reports v5.0.1)

How I run the workload (this is what triggers it):

docker stop vllm-gptoss120b-mxfp4 || true
docker rm vllm-gptoss120b-mxfp4 || true

docker run -d \
--name vllm-gptoss120b-mxfp4 \
--restart on-failure:3 \
--gpus all \
--network host \
--ipc=host \
--memory=110g --memory-swap=110g \
-v $HOME/models/GPT-OSS-120B:/model:ro \
vllm-node-mxfp4:latest \
vllm serve /model \
--host 0.0.0.0 \
--port 8888 \
--served-model-name gpt-oss-120b \
--enable-auto-tool-choice \
--tool-call-parser openai \
--reasoning-parser openai_gptoss \
--load-format fastsafetensors \
--quantization mxfp4 \
--mxfp4-backend CUTLASS \
--mxfp4-layers moe,qkv,o,lm_head \
--attention-backend FLASHINFER \
--kv-cache-dtype fp8 \
--enforce-eager \
--gpu-memory-utilization 0.72 \
--enable-chunked-prefill \
--max-num-batched-tokens 1024 \
--max-num-seqs 1 \
--swap-space 1 \
--max-model-len 131072

What happens:

  • The SSH session (and any monitoring) abruptly disconnects: Software caused connection abort.

  • The machine becomes unreachable on the network until I manually power it back on.

  • After boot, journald reports an unclean shutdown (so it wasn’t a normal OS shutdown).

Evidence for the most recent reset (local time +05, boot -1 → boot 0):

  • /var/log/syslog shows the last line before the reset, then immediately the next kernel boot:

    • 37510:2026-02-05T19:15:22.919929+05:00 gx10-4323 systemd[1]: session-14.scope: Deactivated successfully.

    • 37511:2026-02-05T19:38:34.828912+05:00 gx10-4323 kernel: Booting Linux on physical CPU 0x0000000000 [0x410fd871]

  • journalctl -b 0 (current boot) shows the “unclean shutdown” marker:

    • Feb 05 19:38:33 gx10-4323 systemd-journald[634]: File …/system.journal corrupted or uncleanly shut down, renaming and replacing.
  • The same “unclean shutdown” line is also present in /var/log/syslog:

    • /var/log/syslog: 39248:2026-02-05T19:38:34.830923+05:00 … systemd-journald[634]: File …system.journal corrupted or uncleanly shut down, renaming and replacing.

Important: for this specific reset window, I could not find any NVRM/Xid/OOM lines right before the cut:

  • journalctl -b -1 -k | egrep ‘nvCheckOkFailed|NV_ERR_NO_MEMORY|Out of memory|Xid’ → no matches.

However, on other earlier runs (same host), I do have NVIDIA kernel driver OOM evidence:

Also, on each boot I see:

  • journalctl -b 0 -k:

    • mlx5_core … Detected insufficient power on the PCIe slot (27W). (multiple lines)

Questions:

  1. Is a hard power-off/reset under vLLM long-context load a known issue on GB10 / DGX OS (e.g., driver OOM/hang that doesn’t always flush logs)?

  2. Does the NVRM NV_ERR_NO_MEMORY / _memdescAllocInternal pattern match a known bug (and is there a fix/workaround)?

  3. What’s the best way to capture useful diagnostics for NVIDIA when the box resets abruptly (e.g., recommended logging, pstore, nvidia-bug-report options) so the moment of failure is not lost?

GPT OSS goes crazy sometimes, and just keeps writing the same thing infinitely. Reading around I saw that people saw significant performance degrading after 30k tokens, even though max is in the 6 digits.

I’ve switched LLMs and lowered max context tokens, so I don’t know which helped exactly, but it doesn’t do this anymore.