SparkRun Auto Model Registration with LiteLLM & Local Claude Code Setup

griffith.mark · March 21, 2026, 10:33pm

A big thank you to @Eugr, @DBSIDBSI, and @Raphael.Amorin — what you’re building with SparkRun and SparkArena is genuinely changing what’s possible on the DGX Spark.

For anyone who hasn’t come across it yet: SparkRun is a CLI tool that handles the entire lifecycle of loading a model onto the DGX Spark — container orchestration, vLLM configuration, recipe management — in a single command:

sparkrun run qwen3-coder-next-vllm

That’s it. The model is loaded, vLLM is serving on port 8000, and you’re ready to go. SparkArena extends this with a curated recipe hub where the community is actively publishing optimised model configs for the GB10 architecture — FP8 quants, MoE models, Nemotron variants tuned for the eugr runtime. The pace of what’s appearing there is impressive.

What I’ve built on top of it

The one thing the DGX Spark workflow was still missing was seamless integration with the broader AI tooling ecosystem — Open WebUI, Langflow, Claude Code CLI, anything built on the Anthropic SDK. These tools expect claude-sonnet-4-5, not Qwen/Qwen3-30B-A3B.

So I put together a small stack on top of SparkRun:

LiteLLM proxy that maps all claude-* model names to whatever vLLM is currently serving
sparkrun_sync.py — a daemon that polls vLLM every 30 seconds and auto-registers/deregisters presets in the LiteLLM database as SparkRun loads and unloads models. No proxy restart needed
status.html — a zero-dependency browser dashboard showing live status: model loaded, aliases registered, copy-to-clipboard Claude Code launch commands
smoke_test.py — end-to-end verification from vLLM through to the Anthropic SDK in one command

Everything is open source: GitHub - MARKYMARK55/spark-model-gateway: SparkRun auto model registration with LiteLLM + claude-* alias setup for local inference · GitHub

The result: load any SparkRun model, run the sync daemon once, and every application that uses the Anthropic SDK — including Claude Code CLI — routes to your local model automatically. Swap models and the registrations update within 30 seconds.

The DGX Spark with 128 GB unified memory running Qwen3-Coder-Next-FP8 through Claude Code is a genuinely powerful local coding setup. None of this would be possible without the foundation SparkRun and SparkArena provide — so thank you for building it.

If you find this useful please give me a like on the Nvidia Forum and a Star on the .git Repo.

Many thanks,

– Mark

r.j.zoontjens · May 21, 2026, 3:44am

Your Github link does not work?

dbsci · May 21, 2026, 9:53pm

Also, FYI, sparkun has some built in proxy functionality anyway.

If it doesn’t do what’s needed, definitely can open up an issue for sparkrun and I’ll try to address it: Issues · spark-arena/sparkrun · GitHub

griffith.mark · May 21, 2026, 10:22pm

Sorry set to Private, now public so you should be good to go. Things have evolved since I published this and I think there are better approached to Claude etc now

griffith.mark · May 21, 2026, 10:26pm

Drew,

The proxy feature looks really usefull. I often create custom YAML etc to reduce gpu utilization etc. to reduce model footprint.

Thank you for all the development work and sharing.

Mark

Topic		Replies	Views
Managing Local LLM Orchestration DGX Spark / GB10 Projects	12	2538	April 23, 2026
Spark-inference: Run 3 specialized models simultaneously on your DGX Spark — cybersecurity + coding + orchestration, 30-min setup DGX Spark / GB10 Projects jetson , llama , deepseek , nemotron	3	1131	May 11, 2026
Spark: one script CLI for setup, remote access, and LLM serving on DGX Spark DGX Spark / GB10 Projects cuda , docker , spark , llm , deepseek	3	371	May 21, 2026
Vibe Coding with NVIDIA DGX Spark DGX Spark / GB10	39	5436	May 10, 2026
DGX Spark Model Manager — Open Source Web UI for Ollama, SGLang & LiteLLM DGX Spark / GB10 Projects tools , spark , system-management-and-architecture	6	834	April 24, 2026
Model Orchestration and Deployment DGX Spark / GB10 nim	4	818	November 24, 2025
Best Mix of Models/Services on a Single Spark? DGX Spark / GB10	2	1697	February 1, 2026
Sparkrun - central command with tab completion for launching inference on Spark Clusters DGX Spark / GB10 Projects jetson , deepseek , nemotron	90	3743	June 8, 2026
HOW-TO: setup-dgx-spark docker inference - A "Sane" Inference Stack for GB10 (Need Contributors!) DGX Spark / GB10 Projects docker , llama , dgx	38	2487	April 28, 2026
SparkD: The missing dashboard for spark-vllm-docker DGX Spark / GB10	4	389	April 27, 2026

SparkRun Auto Model Registration with LiteLLM & Local Claude Code Setup

Related topics