Hey all — sharing a tool I built while setting up my local AI stack on the Spark. It’s a lightweight web-based model manager that gives you a single browser tab to control everything: pull Ollama models, download from HuggingFace, manage LiteLLM routing, and start/stop SGLang — no YAML editing, no CLI juggling.
What it does:
- Pull any Ollama model with live download progress and MB counter
- One-click wildcard LiteLLM routing — every model you pull is instantly available to all your apps at :4000 without touching config again
- HuggingFace Hub download with streaming progress, lands in the standard HF cache for SGLang/vLLM compatibility
- SGLang start/stop via configurable launch profiles — just add a profile block pointing at your start script and it appears in the UI
- Live status bar polling SGLang, Ollama, and LiteLLM health every 12 seconds
- Built-in /help docs page
GB10 note: The SGLang profile system lets you define flags per model. The docs include a flag reference table specific to the GB10 / SM121A — e.g. --attention-backend triton required, --quantization modelopt_fp4 and --fp4-gemm-backend are flagged as incompatible.
Built with FastAPI + embedded HTML/JS frontend, runs as a systemd service. Single config.json to point it at your services — works out of the box with the standard DGX Spark stack (Ollama on :11434, SGLang on :30000, LiteLLM on :4000).
Setup is one script: bash setup.sh — it creates a venv, installs deps, optionally adds UFW rules and a systemd service. Happy to answer questions or take PRs.
Hit me up if you have any feedback!




