Hello everyone,
I built an open source live dashboard for LLM inference rigs and I’m looking for fellow DGX Spark owners to test it.
It’s a showcase / live-view tool, not a production metrics stack. One screen combining Nvidia GPU telemetry (NVML) with vLLM inference metrics, so you don’t have to juggle nvidia-smi and vLLM’s /metrics endpoint in separate terminals.
Shows:
-
GPU: utilization, VRAM, power, temps, clocks
-
vLLM: tokens/sec, TTFT, ITL, queue depth, active/waiting requests, KV cache, prefix cache hits
-
Works with vLLM on host or in Docker
-
Auto-detects multiple vLLM instances
It was developed on my own Spark, so I’d like to validate it on other units before calling it done. Spark-specific things I’d especially like checked:
-
Unified memory reporting on GB10
-
vLLM configs typical for Spark (FP8 / NVFP4, MoE, TP on unified memory)
-
Docker setups with the Nvidia container runtime
Repo: https://github.com/niklasfrick/spark-dashboard
Feedback welcome here or as GitHub issues. Thanks to anyone who gives it a try.