Sparkrun - central command with tab completion for launching inference on Spark Clusters

mbinshao · March 3, 2026, 3:18pm

Thank you,It’s working

aceangel · March 6, 2026, 9:47pm

@dbsci I know it probably wasn’t made for this, but I was wondering if a recipe could run Open WebUI? I haven’t gotten it working but here’s the recipe:

github.com/aceangel3k/sparkrun-recipes

recipes/open-webui.yaml

main

description: Open WebUI - OpenAI-compatible web interface for LLMs
container: ghcr.io/open-webui/open-webui:main

volumes:
  /var/run/docker.sock: /var/run/docker.sock

defaults:
  web_port: 3000
  api_host: localhost
  api_port: 8000
  volume_name: open-webui

env:
  OPENAI_API_BASE_URL: "http://{api_host}:{api_port}/v1"

command: |
  echo "Starting Open WebUI with Docker socket mount..."
  
  # Verify Docker is accessible
  if ! docker info &> /dev/null; then

This file has been truncated. show original

dbsci · March 6, 2026, 10:00pm

Recipes get processed by runtime implementations (of which there are several, e.g.: vllm-ray, vllm-distributed, sglang, llama.cpp, etc.). By default, sparkrun has a lot of plumbing for distributing HF models, etc., that you’d need to gracefully disable. So I’m not surprised it doesn’t work out of the box, but…

Technically you could create a “runtime” for arbitrary commands and do that… the general design is that sparkrun expects the runtime implementation to guide what is needed to use that runtime. Most of the important parts of sparkrun are basically the “primitives” that are used for coordinating various activities without having to rewrite them every time since they’re basically always the same. Things like “wait for port”, “run command”, “distribute docker image”, etc.

Also PS. your current recipe is missing model and runtime – which are considered required. Recipe Format | sparkrun

I guess tell me more specifically what you want to accomplish… Is this to help run on a remote system? I think we could force sparkrun to do this, but I suspect it might be cleaner to do it without sparkrun. Although if you need some of the remote functionality, technically you could also use sparkrun as a library and then just leverage its primitives to help you with coordination activities… although since library use hasn’t been a goal so far, I wouldn’t guarantee stability of the internal API yet (even though I try to keep it pretty stable).

vedcsolution · March 6, 2026, 10:03pm

Using the ray panel, every time we launch a VLLM cluster, a very useful ray dash is created; we just need to connect it. This could potentially be integrated into the excellent tools built by the community, such as a mod or SparkRun.

aceangel · March 6, 2026, 10:08pm

Actually, what I’m looking for is a quick way to test models directly in the browser, similar to how llama.cpp works. It’d also be awesome to have mobile support, like testing on an iPad or iPhone.

I know I could just fire up a shell script to launch Open WebUI, but it would’ve been way more convenient to get it running under sparkrun.

dbsci · March 6, 2026, 10:26pm

Let me think about it for a few. I think there could be a relatively clean way to add some functionality for post-start functionality that still fits into the core concept. Because, e.g. you don’t need to pipe through the docker socket and do docker in docker, you basically just need to launch another command that might take or inherit from some of the defaults/params.

It also makes sense that the webui part is somewhat separated from the recipe since the recipe would handle the model details and webui would be the same across all uses.

It’s not entirely different from sparkrun benchmark actually, so technically one way to force this in would be to create a webui “benchmarking framework” and then essentially webui is a never ending benchmark and when you close webui, it closes the underlying model. And you can implement as a benchmarking profile instead of as a recipe (so that way recipes stay intact for models). But I don’t like how much of a hack that is … and since benchmark is so close in functionality… maybe we should just make a generic capability for that… and then make benchmark be on top of that…

dbsci · March 7, 2026, 9:53pm

@aceangel FYI I’m playing with a few major changes to sparkrun that could be used to enable what you’re doing (and more).

cecilkootz · March 7, 2026, 10:22pm

I’ve been using sparkrun for a while and it’s been great! Perhaps I haven’t let it run completely but one capability I think would benefit myself and others is local invocation against completely remote targets. When I run it locally it appears to download the models to my local machine then copy them out. I’ve worked around this with wrappers locally that execute the commands on my sparks but could be nice to be native.

dbsci · March 7, 2026, 10:29pm

Surprise! It has that! One of the newer features that I haven’t really done well to “advertise” because it’s newer and still experimental.

sparkrun run <recipe> --tp 2 --transfer-mode delegated

There are 3 transfer modes (plus auto).

local: get local copies of files and push them directly over CX7 to rest of cluster (requires CX7 connectivity)
push: get local copies of files and push them to head node which will push to rest of cluster (over CX7)
delegated: orchestrate the head node to fetch files, etc. and then push to rest of cluster (over CX7)
auto: essentially selects between local and push based on whether CX7 connectivity is available from the sparkrun node to the balance of cluster

So the delegated mode is basically what you’re looking for. If it works for you, you can also do sparkrun cluster update <cluster-name> --transfer-mode delegated to make it apply by default for that cluster so you don’t have to type it repeatedly.

cecilkootz · March 8, 2026, 11:24pm

Got some time to test this tonight. Great news to hear it’s being included. When I tried I noticed it looks like it wants to build the container locally with delegated. I’ll look a bit to get more conditions. I was testing with

sparkrun run qwen3.5-122b-int4-autoround --transfer-mode delegated --hosts 192.168.2.21 --image vllm-node-tf5:latest

Reusing eugr repo from registry cache: /Users/cecil/.cache/sparkrun/registries/eugr-vllm
Updating eugr/spark-vllm-docker repo...
Building eugr container: /Users/cecil/.cache/sparkrun/registries/eugr-vllm/build-and-copy.sh -t vllm-node-tf5:latest --tf5
Commit hash matches (65d6e4a2) — wheels are up to date.
All flashinfer wheels are up to date — skipping download.
FlashInfer wheels ready.
Downloading vllm-0.17.0rc1.dev139+g85f50eb41.d20260307.cu131-cp312-cp312-linux_aarch64.whl...
######################################################################## 100.0%
Recorded vllm commit hash: 85f50eb41
vLLM wheels ready.
Using transformers>=5.0.0...
Building runner image with command: docker build -t vllm-node-tf5:latest --build-arg BUILD_JOBS=16 --build-arg TORCH_CUDA_ARCH_LIST=12.1a --build-arg FLASHINFER_CUDA_ARCH_LIST=12.1a --build-arg PRE_TRANSFORMERS=1 .
ERROR: failed to connect to the docker API at unix:///Users/cecil/.docker/run/docker.sock; check if the path is correct and if the daemon is running: dial unix /Users/cecil/.docker/run/docker.sock: connect: no such file or directory
Traceback (most recent call last):
  File "/Users/cecil/.local/bin/sparkrun", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/cecil/.local/share/uv/tools/sparkrun/lib/python3.12/site-packages/click/core.py", line 1485, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/cecil/.local/share/uv/tools/sparkrun/lib/python3.12/site-packages/click/core.py", line 1406, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/Users/cecil/.local/share/uv/tools/sparkrun/lib/python3.12/site-packages/click/core.py", line 1873, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/cecil/.local/share/uv/tools/sparkrun/lib/python3.12/site-packages/click/core.py", line 1269, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/cecil/.local/share/uv/tools/sparkrun/lib/python3.12/site-packages/click/core.py", line 824, in invoke
    return callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/cecil/.local/share/uv/tools/sparkrun/lib/python3.12/site-packages/click/decorators.py", line 34, in new_func
    return f(get_current_context(), *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/cecil/.local/share/uv/tools/sparkrun/lib/python3.12/site-packages/sparkrun/cli/_run.py", line 230, in run
    runtime.prepare(recipe, host_list, config=config, dry_run=dry_run)
  File "/Users/cecil/.local/share/uv/tools/sparkrun/lib/python3.12/site-packages/sparkrun/runtimes/eugr_vllm_ray.py", line 85, in prepare
    builder.prepare_image(image, recipe, hosts, config=config, dry_run=dry_run)
  File "/Users/cecil/.local/share/uv/tools/sparkrun/lib/python3.12/site-packages/sparkrun/builders/eugr.py", line 91, in prepare_image
    self._build_image(image, build_args, dry_run)
  File "/Users/cecil/.local/share/uv/tools/sparkrun/lib/python3.12/site-packages/sparkrun/builders/eugr.py", line 163, in _build_image
    raise RuntimeError("eugr container build failed (exit %d)" % result.returncode)
RuntimeError: eugr container build failed (exit 1)

dbsci · March 9, 2026, 12:41am

eugr’s build is transitioning based on him building verified wheels and then the docker build should be fast/confirmation, so the latest versions of sparkrun I think always goes to the build (same way that I always trigger hf download even if cache exists, just to confirm the cache, it’s usually quite fast if all is up to date) – but indeed it shouldn’t be triggering locally if we’re in delegated mode.

The extra confusion here is that recipes from eugr’s repo / v1 recipes are always translated to use eugr-vllm runtime as a compatibility measure. Technically sparkrun’s recipe format is v2 and is generally a superset of the v1 recipes – but in order to ensure compatibility, if particular v1-specific elements are in the recipe, it’ll automatically act in a compatibility mode (likely making your image override effectively useless).

looks like there might be other problems as well – I am in the midst of creating more standardized extension points for “builders”, pre-exec and post-exec commands to allow a lot more flexibility in how sparkrun can be used – I’m aiming to make using eugr’s build system be “independent” of the runtime since once the container + mods are ready, it doesn’t matter where it came from, the execution path should be unified.

So anyway, I’m in the middle of transitions and they may not all play well together with building and delegated transfer mode (yet).

Make sure you’re on the latest version and do:
sparkrun --verbose run qwen3.5-122b-int4-autoround --transfer-mode delegated --hosts 192.168.2.21 --image vllm-node-tf5:latest (so exact same thing but adding --verbose between sparkrun and run. That’s going to be way more verbose and include a lot of extra debugging output. That might be helpful for me to see the flow on your systems/situation. I’ll keep looking at this, the delegated transfer mode isn’t considered ready to go yet (obviously) but I do want to get it there soon.

I think I prefer if you make a github issue so that we don’t have too much of a debugging conversation in this forum topic. Happy to iterate a little bit on this with you to make sure that we get it right. There are a few parts we need to address, but it shouldn’t take too long to get through it.

dbsci · March 9, 2026, 12:53am

Experimental new feature! sparkrun proxy. A lot of people are using litellm as a proxy, so I thought this would be helpful for various different purposes. It’s hot off the presses, so not mature yet! At this point, the proxy runs on the machine from which you are running sparkrun.

sparkrun proxy start Automatically configured litellm proxy that shares your sparkrun cluster jobs. If you’ve got 4 sparks running 4 models, you can see them all at one endpoint now. Auto-discovery at 30s intervals to update models.

drew@spark-2918:~$ sparkrun proxy start
Discovering inference endpoints...
Discovered 1 healthy endpoint(s):
  10.24.11.13:8002 — qwen3.5-0.8b (sglang)

Starting proxy on 0.0.0.0:4000...
Proxy started (PID 117622) on 0.0.0.0:4000
Log: /home/drew/.cache/sparkrun/proxy/litellm.log
Proxy started. API: http://localhost:4000/v1

sparkrun proxy alias lets you name/rename models on the fly.

drew@spark-2918:~$ sparkrun proxy alias add small qwen3.5-0.8b
Alias added: small -> qwen3.5-0.8b
Reloading proxy to apply alias...
Sent SIGTERM to proxy PID 117622
Proxy started (PID 121020) on 0.0.0.0:4000
Log: /home/drew/.cache/sparkrun/proxy/litellm.log
Proxy reloaded.
drew@spark-2918:~$ sparkrun proxy alias list
Aliases:
  small                          -> qwen3.5-0.8b

drew@spark-2918:~$ sparkrun proxy status
Proxy status: running
  PID:    121020
  Host:   0.0.0.0
  Port:   4000
  Start:  2026-03-09T00:49:09.306688+00:00

Registered models (2):
  small
  qwen3.5-0.8b

drew@spark-2918:~$ sparkrun proxy stop
Sent SIGTERM to proxy PID 121020
Proxy stopped.

AoE · March 10, 2026, 12:40pm

I’ve tried to set up earlyoom the sparkrunway, but I had to use different commands than expected.

I setup my CX-7 manually and added a cluster following the instructions which stipulate that we have to use the management IPs as arguments.

But then this failed, because it tried to use the management IPs to connect:

$ sparkrun setup earlyoom --cluster mycluster

This worked (using the CX-7) IPs

$ sparkrun setup earlyoom --hosts 192.168.10.11,192.168.10.12

So maybe the instructions are wrong and the --cluster switch should not be used?

dbsci · March 10, 2026, 3:27pm

The reason for using the management IPs is because I think we should be strict about what traffic goes over which interface. It’s not like it doesn’t work when you use different interfaces if they’re all the same machine. I just try to enforce that we minimize traffic on the CX7 interfaces that isn’t specifically related to scaling inference. (And once connected, sparkrun will detect the CX7 interfaces and configure using them for NCCL, etc.)

If you did your own CX7 setup, my hypothesis is that you configured passwordless SSH for your CX7 interfaces but not the management interfaces – which would explain the outcome here.

The commands are being sent over SSH (unless a node is 127.0.0.1). So when you tried the install over CX7 interfaces, it was able to SSH in, and therefore was able to proceed. At least that’s my guess from the information available.

Essentially, the --hosts option is an override on the hosts that are applied for the current command is a valid way to target machines for activities. The priority order is basically: explicit hosts > explicit cluster > default cluster.

So especially for a “one-off” command like sparkrun setup earlyoom, honestly whatever makes it work is fine – but it should work with the default cluster or with --cluster, but will require that passwordless SSH is configured for the management interfaces.

AoE · March 10, 2026, 3:37pm

Thank you Drew, all your assumptions were correct :). Passwordless SSH was not yet setup on the management IPs as I’ve not yet synchronised any data over that channel.

I’m still glad that it worked as it was quick and efficient to get it done that way.

griffith.mark · March 12, 2026, 8:01pm

Subject: Thank you for sparkrun!

Thank you, @dbsci, for creating and sharing this!

I have to admit, at first glance, I assumed sparkrun was another tool for managing large-scale Spark clusters—something useful but not immediately relevant to my daily work. However, after reading through other posts you have commented on and seeing the details, I realize that is not the case at all.

This tool is incredibly useful, even for someone like me working primarily on single nodes like the DGX Spark. The ability to get VRAM estimates before even launching a model (sparkrun show and sparkrun recipe vram) is a game-changer for planning what fits. The tab completion for recipes and options is a fantastic quality-of-life feature, and the unified interface to run vllm, sglang, or llama.cpp with a simple sparkrun run command streamlines everything.

It’s a very nice way of pulling together different recipe repositories (like eugr’s) into one central, easy-to-use command. It truly gives us the best of both worlds: the simplicity of a local launcher with the power and flexibility to scale to a cluster when needed. Great work on this, and thank you for making it experimental but already so polished and useful!

Mark

dbsci · March 12, 2026, 8:43pm

Thank you / you’re welcome.

I’m glad it’s helpful to you. I’m working with @eugr and @raphael.amorim to try to enhance recipe availability, container images, and finding recipes on https://spark-arena.com so that we can all focus more on what we’re trying to actually do instead of the logistics.

There is a lot of “unexplored” depth in the registry functionality that was added for a reason, so stay tuned… we’re going to make it even more convenient ;-)

-Drew

dbsci · March 13, 2026, 7:07pm

Ok major update to sparkrun proxy! Still experimental, but hopefully ready for wider testing.

Update now to try it: sparkrun update

sparkrun proxy

A unified OpenAI-compatible gateway that discovers running sparkrun inference endpoints and exposes them through a single API powered by LiteLLM.

Overview

The proxy sits in front of one or more inference workloads launched by sparkrun and provides:

Live endpoint discovery using same mechanism as sparkrun cluster status and sparkrun cluster monitor
Auto-discovery background process that periodically re-scans and syncs models (in case of drift)
Health checking via GET /v1/models on each discovered endpoint
Deduplication of endpoints reachable on multiple network interfaces (e.g. management IP vs ConnectX-7 IP)
Model aliases managed via the LiteLLM management API (proxy does not need to restart to add/remove models or aliases)
Load/unload models through sparkrun proxy load or sparkrun proxy unload to keep the proxy in sync (although autodiscovery should also ensure models are available)

# Start the proxy (discovers endpoints automatically if relevant)
sparkrun proxy start

# Load a new model
sparkrun proxy load qwen3.5-0.8b-bf16-sglang

# Query models through the unified API
curl http://localhost:4000/v1/models

Commands

`sparkrun proxy start`

Discovers running endpoints, generates a LiteLLM config, and launches the proxy. When a cluster or hosts are specified, discovery uses live SSH queries (docker ps) for authoritative container state. A background auto-discover process periodically re-scans and syncs models with the proxy.

sparkrun proxy start                            # defaults: 0.0.0.0:4000
sparkrun proxy start --port 8080                # custom port
sparkrun proxy start --cluster mylab            # set target cluster
sparkrun proxy start --hosts 10.0.0.1,10.0.0.2  # explicit host list (overrides cluster hosts)
sparkrun proxy start --foreground               # run in foreground (blocking)

By default, the proxy daemonizes in the background. Logs are written to ~/.cache/sparkrun/proxy/litellm.log.

`sparkrun proxy stop`

Sends SIGTERM to the running proxy and its auto-discover process using the stored PIDs.

sparkrun proxy stop

`sparkrun proxy status`

Shows whether the proxy is running, its PID, bind address, auto-discover status, and lists models registered via the
LiteLLM management API.

sparkrun proxy status

`sparkrun proxy models`

Lists models currently registered with the running proxy. With --refresh, re-discovers endpoints and syncs the proxy — adding newly available models and removing stale entries whose backends are no longer healthy.

sparkrun proxy models
sparkrun proxy models --refresh

`sparkrun proxy load <recipe>`

Launches an inference workload via sparkrun run (detached) and registers it with the running proxy.

Unlike plain sparkrun run, proxy load automatically avoids port conflicts. When no --port is specified, it loads the recipe to determine the desired port (e.g. 8000), then checks the head host over SSH (using nc -z, the same mechanism as sparkrun benchmark) to find the first available port. If the desired port is occupied, it increments until a free port is found:

$ sparkrun proxy load qwen3-1.7b-vllm
# Uses port 8000

$ sparkrun proxy load qwen3.5-35b-a3b-fp8-sglang
# Note: port 8000 in use on 10.24.11.13, using 8001 instead

This is intentionally different from sparkrun run, which uses exactly the port specified (or the recipe default) and fails if it’s occupied — preserving the user’s explicit intent. The proxy’s load command is designed for managing multiple concurrent models where automatic port assignment is expected.

`sparkrun proxy unload <recipe>`

Stops the inference workload containers directly (same logic as sparkrun stop) and syncs the proxy to remove the now-stale model entry.

sparkrun proxy unload qwen3-1.7b-vllm --cluster mylab

`sparkrun proxy alias`

Manage model aliases so clients can reference models by friendly names. Aliases are applied and removed via the LiteLLM management API — no proxy restart required.

sparkrun proxy alias add qwen3-small "Qwen/Qwen3-1.7B"
sparkrun proxy alias remove qwen3-small
sparkrun proxy alias list

Auto-discovery

When the proxy starts with auto-discover enabled (the default), a background process runs alongside the proxy:

Periodically calls discover_endpoints at the configured interval (default: 30 seconds)
Syncs results with the proxy via the LiteLLM management API (adds new models, removes stale ones)
Also syncs configured aliases on each sweep
Monitors the proxy PID and exits automatically when the proxy dies
Runs as a detached subprocess

Can be disabled with --no-auto-discover.

Architecture

sparkrun proxy start
        │
        ▼
┌──────────────┐     ┌──────────────────┐
│  Discovery   │────▶│  Health Check    │
│  (SSH)       │     │  (GET /v1/models)│
└──────────────┘     └────────┬─────────┘
                              │
                              ▼
                    ┌──────────────────┐
                    │ LiteLLM Config   │
                    │   Generation     │
                    └────────┬─────────┘
                             │
                             ▼
                    ┌──────────────────┐     ┌──────────────────┐
                    │  uvx litellm     │     │  Auto-discover   │
                    │  (subprocess)    │◀───▶│  (background)    │
                    └──────────────────┘     └──────────────────┘
                             │
                             ▼
                    OpenAI-compatible API
                    on localhost:4000 (or selected port)

Clients ──▶ localhost:4000/v1/... ──▶ LiteLLM ──▶ backend endpoints

voktolom · March 14, 2026, 8:49am

Good afternoon. Is it planned to add a WEB interface for SPARKRUN in the future? I don’t really like using commands in the terminal, as my main job is not related to programming, and I think more and more users like me are becoming available. I use AI when configuring and running models on DGX Spark. A web interface would greatly improve the usability of SPARKRUN. Thank you for providing a high-quality product!

dbsci · March 14, 2026, 3:43pm

I’m glad if you find it useful.

I actually was making a more ambitious project that did also have a webUI (the idea was to make what I thought NVIDIA Sync + Dashboard should have been) – and sparkrun was basically created from extracting the control components of that project.

I have been considering to make a web interface to make it more accessible to users like yourself and resurrect some of that earlier work; however, I think it’s going to be a bit later after further refinement of the developer-focused aspects.

I hope it’s still useful to you in the meantime and feel free to comment about which activities are the highest priority for you via web (just run/stop? inventory of recipes? recipe search? cluster status? cluster setup? proxy?). I am making sparkrun both for my own reasons and for the community, so I have been trying to be responsive to community needs to try to make sparkrun as useful as possible.

Topic		Replies	Views
Managing Local LLM Orchestration DGX Spark / GB10 Projects	11	638	March 13, 2026
HOW-TO: setup-dgx-spark docker inference - A "Sane" Inference Stack for GB10 (Need Contributors!) DGX Spark / GB10 Projects docker , llama , dgx	30	1157	March 11, 2026
SparkRun Auto Model Registration with LiteLLM & Local Claude Code Setup DGX Spark / GB10 Projects inference-fil-spark , spark , nemotron	0	123	March 21, 2026
DGX Spark + Qwen3-Next-80B: Proven Performance, But Missing Clear Path to NIM, TensorRT-LLM & Web UIs DGX Spark / GB10 cuda , nim , llama	16	3431	March 6, 2026
Introducing the Spark Arena DGX Spark / GB10	124	3760	March 24, 2026
New pre-built vLLM Docker Images for NVIDIA DGX Spark DGX Spark / GB10	72	5966	March 23, 2026
HOW-TO: Run Qwen3-Coder-Next on Spark DGX Spark / GB10 llama	92	7457	March 24, 2026
Best Mix of Models/Services on a Single Spark? DGX Spark / GB10	2	573	February 1, 2026
DGX Spark: The Sovereign AI Stack — Dual-Model Architecture for Local Inference DGX Spark / GB10 Projects docker , spark , llm	9	1457	February 13, 2026
How are you planning on using your DGX spark? DGX Spark / GB10 Projects	22	2324	February 24, 2026

Sparkrun - central command with tab completion for launching inference on Spark Clusters

sparkrun proxy

Overview

Commands

sparkrun proxy start

sparkrun proxy stop

sparkrun proxy status

sparkrun proxy models

sparkrun proxy load <recipe>

sparkrun proxy unload <recipe>

sparkrun proxy alias