Introducing the Spark Arena

dbsci · February 17, 2026, 4:42am

Time for experiments! Sparkrun - central command with tab completion for launching inference on Spark Clusters.

Centralized tool for running inference recipes drawing from many registries, ability to add your own registries, supports eugr’s vllm builds and my container images & others (NVIDIA too, etc.). Tab autocompletion for recipe lookups, commands, etc. Named clusters and saving default for working with multiple clusters. Supports solo and cluster. VRAM estimation. Supports vllm and sglang. Easy to add more runtimes. (Pass-thru delegation to recipe scripts in eugr’s repo is implemented as another runtime.) Recipe design was modified to be extremely similar to yours and @eugr’s recipes so (1) compatible for working together and (2) iterate on some details.

I’m also interested to collaborate and contribute it to community org that manages such resources.

drew@spark-840b:~$ sparkrun search qwen3
Name                                  Runtime     Model                       Registry
-----------------------------------------------------------------------------------------------
qwen3-1.7b-sglang                     sglang      Qwen/Qwen3-1.7B             sparkrun-official
qwen3-1.7b-vllm                       vllm        Qwen/Qwen3-1.7B             sparkrun-official
qwen3-coder-next-fp8-sglang-cluster   sglang      Qwen/Qwen3-Coder-Next-FP8   sparkrun-official
Qwen3-Coder-Next-FP8                  eugr-vllm   Qwen/Qwen3-Coder-Next-FP8   eugr-vllm

This is extracted from what I was working on before–basically I was making what I thought NVIDIA Sync should have been – which also included way more UI and ConnectX-7 Setup Wizard, etc. etc… it’s way faster to dump stuff into a CLI.

And making sure tab completion worked was like the best decision ever… it really does make life better…

drew@spark-840b:~$ sparkrun cluster create DGXSolo --hosts 127.0.0.1
Created cluster 'DGXSolo' with 1 hosts

drew@spark-840b:~$ sparkrun cluster set-default DGXSolo
Set default cluster to 'DGXSolo'

drew@spark-840b:~$ sparkrun run qwen3-
qwen3-1.7b-sglang                    qwen3-1.7b-vllm                      qwen3-coder-next-fp8-sglang-cluster  qwen3-coder-next-fp8

drew@spark-840b:~$ sparkrun run qwen3-1.7b-
qwen3-1.7b-sglang  qwen3-1.7b-vllm

drew@spark-840b:~$ sparkrun run qwen3-1.7b-sglang

Ensuring container image is available locally...
Image already available: scitrera/dgx-spark-sglang:0.5.8-t5
Ensuring model Qwen/Qwen3-1.7B is available locally...
Downloading model: Qwen/Qwen3-1.7B...
Fetching 12 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:24<00:00,  2.07s/it]
Download complete: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.06G/4.06G [00:24<00:00, 335MB/s]Model downloaded successfully: Qwen/Qwen3-1.7B
Runtime:   sglang
Image:     scitrera/dgx-spark-sglang:0.5.8-t5
Model:     Qwen/Qwen3-1.7B
Cluster:   sparkrun_80efe3c1ea32
Mode:      solo
config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 726/726 [00:00<00:00, 14.1MB/s]
config.json:   0%|                                                                                                                                                            | 0.00/726 [00:00<?, ?B/s]
VRAM Estimation:
  Model dtype:      bf16
  Model params:     1,700,000,000
  KV cache dtype:   bfloat16
  Architecture:     28 layers, 8 KV heads, 128 head_dim
  Model weights:    3.17 GB
  Tensor parallel:  1
  Per-GPU total:    3.17 GB
  DGX Spark fit:    YES

  GPU Memory Budget:
    gpu_memory_utilization: 30%
    Usable GPU memory:     36.3 GB (121 GB x 30%)
    Available for KV:      33.1 GB
    Max context tokens:    310,205

Hosts:     default cluster 'DGXSolo'
  Target:  127.0.0.1

Serve command:
  python3 -m sglang.launch_server \
      --model-path Qwen/Qwen3-1.7B \
      --served-model-name qwen3-1.7b \
      --mem-fraction-static 0.3 \
      --tp-size 1 \
      --host 0.0.0.0 \
      --port 8000 \
      --reasoning-parser deepseek-r1 \
      --trust-remote-code

Step 1/3: Detecting InfiniBand on 127.0.0.1...
  InfiniBand detected locally, NCCL configured
Step 1/3: IB detection done (0.3s)
Step 2/3: Launching container sparkrun_80efe3c1ea32_solo on 127.0.0.1 (image: scitrera/dgx-spark-sglang:0.5.8-t5)...
Step 2/3: Container launched (0.8s)
Step 3/3: Executing serve command in sparkrun_80efe3c1ea32_solo...
Step 3/3: Serve command dispatched (3.1s)
Following serve logs in container 'sparkrun_80efe3c1ea32_solo' on 127.0.0.1 (Ctrl-C to stop)...

...logs...

Ctrl+C stops following logs and does not terminate inference – you can easily reconnect to logs

drew@spark-840b:~$ sparkrun logs qwen3-1.7b-sglang

And easily stop the inference job (obviously can also be stopped via docker)

drew@spark-840b:~$ sparkrun stop qwen3-1.7b-sglang
Workload stopped on 1 host(s).

Topic		Replies	Views
HOW-TO: Run Qwen3-Coder-Next on Spark DGX Spark / GB10 llama	92	8389	March 24, 2026
(sparkrun) Qwen3.5 GGUF Benchmarks over llama.cpp RPC DGX Spark / GB10 Projects llama	3	610	March 11, 2026
RedHatAI/Qwen3.5-122B-A10B-NVFP4 seems to be the best option for a single Spark DGX Spark / GB10 Projects llm	74	4373	April 11, 2026
DGX Spark + Qwen3-Next-80B: Proven Performance, But Missing Clear Path to NIM, TensorRT-LLM & Web UIs DGX Spark / GB10 cuda , nim , llama	16	3808	March 6, 2026
Does Qwen3.5-35B-A3B on GB10 leave a lot of performance on the table? DGX Spark / GB10 agentic-ai	40	4549	March 16, 2026
Qwen/Qwen3.5-122B-A10B - Alibaba/Qwen thought about us... :-D DGX Spark / GB10	340	14519	March 24, 2026
New bleeding-edge vLLM Docker Image: avarok/vllm-nvfp4-gb10-sm120 DGX Spark / GB10 Projects	35	2761	December 31, 2025
Dgx spark benchmark performance DGX Spark / GB10	17	1915	January 4, 2026
DGX Spark performance DGX Spark / GB10	50	3957	February 27, 2026
Qwen3.5-122B-A10B on single Spark: up to 51 tok/s (v2.1 — patches + quick-start + benchmark) DGX Spark / GB10 cuda , performance , docker , performance-tuning , llm	266	5958	April 14, 2026

Introducing the Spark Arena

Related topics