DGX Spark Model Manager — Open Source Web UI for Ollama, SGLang & LiteLLM

Hey all — sharing a tool I built while setting up my local AI stack on the Spark. It’s a lightweight web-based model manager that gives you a single browser tab to control everything: pull Ollama models, download from HuggingFace, manage LiteLLM routing, and start/stop SGLang — no YAML editing, no CLI juggling.

What it does:

  • Pull any Ollama model with live download progress and MB counter
  • One-click wildcard LiteLLM routing — every model you pull is instantly available to all your apps at :4000 without touching config again
  • HuggingFace Hub download with streaming progress, lands in the standard HF cache for SGLang/vLLM compatibility
  • SGLang start/stop via configurable launch profiles — just add a profile block pointing at your start script and it appears in the UI
  • Live status bar polling SGLang, Ollama, and LiteLLM health every 12 seconds
  • Built-in /help docs page

GB10 note: The SGLang profile system lets you define flags per model. The docs include a flag reference table specific to the GB10 / SM121A — e.g. --attention-backend triton required, --quantization modelopt_fp4 and --fp4-gemm-backend are flagged as incompatible.

Built with FastAPI + embedded HTML/JS frontend, runs as a systemd service. Single config.json to point it at your services — works out of the box with the standard DGX Spark stack (Ollama on :11434, SGLang on :30000, LiteLLM on :4000).

GitHub: GitHub - calico88x/DGX-Model-Manager: A web-based model manager for the NVIDIA DGX Spark — pull Ollama models, download from HuggingFace, manage LiteLLM routing, and control SGLang from one browser tab. No YAML editing required. · GitHub

Setup is one script: bash setup.sh — it creates a venv, installs deps, optionally adds UFW rules and a systemd service. Happy to answer questions or take PRs.

Hit me up if you have any feedback!

too good, magical, um

This looks great! Can you add vllm support?

I am working on adding vllm support right now and I will post the update here.

The Model Manager [v0.0.5a] should now have vLLM support in its routing.

Let me know if this works for you, I don’t use vLLM but it should pick up your config and route it through LiteLLM, point your apps to LiteLLM and manage the model loading through the app. 🤞

I’d like to post the latest update on the 🔰DGX Model Manager

  • Confirmed vLLM functional - Worked downloading & loading Nemotron 3 Super NVFP4
  • Clear descriptions on where scripts live ~/vLLM/ < scripts go here >
  • HF Download page has model inventory with parameter detection (experimental)
  • Better progress display on download
  • Better progress display on model loading

vLLM Engine:

HF Downloads:

New version 0.1.1

  • 5 Inference Engines : Start/stop SGLang, vLLM, llama.cpp, LocalAI, and ComfyUI via configurable profiles
  • 🤗HuggingFace Browser : Search HF Hub, discover quantized variants, preview files, one-click download
  • Live Status Bar : Real-time health indicators for all 7 services
  • Logs & Diagnostics : System overview, app logs, engine logs, LiteLLM journalctl, Docker state — all in-browser
  • Settings & Security : Configurable service URLs, optional API key authentication, connectivity testing
  • Performance tweaks, better fault tolerance

More services

New screens :

Huggingface Hub

Logs and Debug