Looking for testers: open source live dashboard for GPU + vLLM inference metrics

Hello everyone,

I built an open source live dashboard for LLM inference rigs and I’m looking for fellow DGX Spark owners to test it.

It’s a showcase / live-view tool, not a production metrics stack. One screen combining Nvidia GPU telemetry (NVML) with vLLM inference metrics, so you don’t have to juggle nvidia-smi and vLLM’s /metrics endpoint in separate terminals.

Shows:

  • GPU: utilization, VRAM, power, temps, clocks

  • vLLM: tokens/sec, TTFT, ITL, queue depth, active/waiting requests, KV cache, prefix cache hits

  • Works with vLLM on host or in Docker

  • Auto-detects multiple vLLM instances

It was developed on my own Spark, so I’d like to validate it on other units before calling it done. Spark-specific things I’d especially like checked:

  • Unified memory reporting on GB10

  • vLLM configs typical for Spark (FP8 / NVFP4, MoE, TP on unified memory)

  • Docker setups with the Nvidia container runtime

Repo: https://github.com/niklasfrick/spark-dashboard

Feedback welcome here or as GitHub issues. Thanks to anyone who gives it a try.

3 Likes