A big thank you to @Eugr, @DBSIDBSI, and @Raphael.Amorin — what you’re building with SparkRun and SparkArena is genuinely changing what’s possible on the DGX Spark.
For anyone who hasn’t come across it yet: SparkRun is a CLI tool that handles the entire lifecycle of loading a model onto the DGX Spark — container orchestration, vLLM configuration, recipe management — in a single command:
sparkrun run qwen3-coder-next-vllm
That’s it. The model is loaded, vLLM is serving on port 8000, and you’re ready to go. SparkArena extends this with a curated recipe hub where the community is actively publishing optimised model configs for the GB10 architecture — FP8 quants, MoE models, Nemotron variants tuned for the eugr runtime. The pace of what’s appearing there is impressive.
What I’ve built on top of it
The one thing the DGX Spark workflow was still missing was seamless integration with the broader AI tooling ecosystem — Open WebUI, Langflow, Claude Code CLI, anything built on the Anthropic SDK. These tools expect claude-sonnet-4-5, not Qwen/Qwen3-30B-A3B.
So I put together a small stack on top of SparkRun:
-
LiteLLM proxy that maps all
claude-*model names to whatever vLLM is currently serving -
sparkrun_sync.py— a daemon that polls vLLM every 30 seconds and auto-registers/deregisters presets in the LiteLLM database as SparkRun loads and unloads models. No proxy restart needed -
status.html— a zero-dependency browser dashboard showing live status: model loaded, aliases registered, copy-to-clipboard Claude Code launch commands -
smoke_test.py— end-to-end verification from vLLM through to the Anthropic SDK in one commandEverything is open source: GitHub - MARKYMARK55/spark-model-gateway: SparkRun auto model registration with LiteLLM + claude-* alias setup for local inference · GitHub
The result: load any SparkRun model, run the sync daemon once, and every application that uses the Anthropic SDK — including Claude Code CLI — routes to your local model automatically. Swap models and the registrations update within 30 seconds.
The DGX Spark with 128 GB unified memory running Qwen3-Coder-Next-FP8 through Claude Code is a genuinely powerful local coding setup. None of this would be possible without the foundation SparkRun and SparkArena provide — so thank you for building it.
If you find this useful please give me a like on the Nvidia Forum and a Star on the .git Repo.
Many thanks,
– Mark