Introducing the Spark Arena

Time for experiments! Sparkrun - central command with tab completion for launching inference on Spark Clusters.

Centralized tool for running inference recipes drawing from many registries, ability to add your own registries, supports eugr’s vllm builds and my container images & others (NVIDIA too, etc.). Tab autocompletion for recipe lookups, commands, etc. Named clusters and saving default for working with multiple clusters. Supports solo and cluster. VRAM estimation. Supports vllm and sglang. Easy to add more runtimes. (Pass-thru delegation to recipe scripts in eugr’s repo is implemented as another runtime.) Recipe design was modified to be extremely similar to yours and @eugr’s recipes so (1) compatible for working together and (2) iterate on some details.

I’m also interested to collaborate and contribute it to community org that manages such resources.

drew@spark-840b:~$ sparkrun search qwen3
Name                                  Runtime     Model                       Registry
-----------------------------------------------------------------------------------------------
qwen3-1.7b-sglang                     sglang      Qwen/Qwen3-1.7B             sparkrun-official
qwen3-1.7b-vllm                       vllm        Qwen/Qwen3-1.7B             sparkrun-official
qwen3-coder-next-fp8-sglang-cluster   sglang      Qwen/Qwen3-Coder-Next-FP8   sparkrun-official
Qwen3-Coder-Next-FP8                  eugr-vllm   Qwen/Qwen3-Coder-Next-FP8   eugr-vllm

This is extracted from what I was working on before–basically I was making what I thought NVIDIA Sync should have been – which also included way more UI and ConnectX-7 Setup Wizard, etc. etc… it’s way faster to dump stuff into a CLI.

And making sure tab completion worked was like the best decision ever… it really does make life better…

drew@spark-840b:~$ sparkrun cluster create DGXSolo --hosts 127.0.0.1
Created cluster 'DGXSolo' with 1 hosts

drew@spark-840b:~$ sparkrun cluster set-default DGXSolo
Set default cluster to 'DGXSolo'

drew@spark-840b:~$ sparkrun run qwen3-
qwen3-1.7b-sglang                    qwen3-1.7b-vllm                      qwen3-coder-next-fp8-sglang-cluster  qwen3-coder-next-fp8

drew@spark-840b:~$ sparkrun run qwen3-1.7b-
qwen3-1.7b-sglang  qwen3-1.7b-vllm

drew@spark-840b:~$ sparkrun run qwen3-1.7b-sglang

Ensuring container image is available locally...
Image already available: scitrera/dgx-spark-sglang:0.5.8-t5
Ensuring model Qwen/Qwen3-1.7B is available locally...
Downloading model: Qwen/Qwen3-1.7B...
Fetching 12 files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 12/12 [00:24<00:00,  2.07s/it]
Download complete: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4.06G/4.06G [00:24<00:00, 335MB/s]Model downloaded successfully: Qwen/Qwen3-1.7B
Runtime:   sglang
Image:     scitrera/dgx-spark-sglang:0.5.8-t5
Model:     Qwen/Qwen3-1.7B
Cluster:   sparkrun_80efe3c1ea32
Mode:      solo
config.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 726/726 [00:00<00:00, 14.1MB/s]
config.json:   0%|                                                                                                                                                            | 0.00/726 [00:00<?, ?B/s]
VRAM Estimation:
  Model dtype:      bf16
  Model params:     1,700,000,000
  KV cache dtype:   bfloat16
  Architecture:     28 layers, 8 KV heads, 128 head_dim
  Model weights:    3.17 GB
  Tensor parallel:  1
  Per-GPU total:    3.17 GB
  DGX Spark fit:    YES

  GPU Memory Budget:
    gpu_memory_utilization: 30%
    Usable GPU memory:     36.3 GB (121 GB x 30%)
    Available for KV:      33.1 GB
    Max context tokens:    310,205

Hosts:     default cluster 'DGXSolo'
  Target:  127.0.0.1

Serve command:
  python3 -m sglang.launch_server \
      --model-path Qwen/Qwen3-1.7B \
      --served-model-name qwen3-1.7b \
      --mem-fraction-static 0.3 \
      --tp-size 1 \
      --host 0.0.0.0 \
      --port 8000 \
      --reasoning-parser deepseek-r1 \
      --trust-remote-code

Step 1/3: Detecting InfiniBand on 127.0.0.1...
  InfiniBand detected locally, NCCL configured
Step 1/3: IB detection done (0.3s)
Step 2/3: Launching container sparkrun_80efe3c1ea32_solo on 127.0.0.1 (image: scitrera/dgx-spark-sglang:0.5.8-t5)...
Step 2/3: Container launched (0.8s)
Step 3/3: Executing serve command in sparkrun_80efe3c1ea32_solo...
Step 3/3: Serve command dispatched (3.1s)
Following serve logs in container 'sparkrun_80efe3c1ea32_solo' on 127.0.0.1 (Ctrl-C to stop)...

...logs...

Ctrl+C stops following logs and does not terminate inference – you can easily reconnect to logs

drew@spark-840b:~$ sparkrun logs qwen3-1.7b-sglang

And easily stop the inference job (obviously can also be stopped via docker)

drew@spark-840b:~$ sparkrun stop qwen3-1.7b-sglang
Workload stopped on 1 host(s).
1 Like