New pre-built sglang Docker Images for NVIDIA DGX Spark

dbsci · March 27, 2026, 3:02pm

sparkrun. I also recommend it for single nodes (not just multi-node clusters).

The latest version also includes a CLI setup wizard to try to help you do setup for networking, SSH config, etc. The wizard is new, so feedback is appreciated, but hopefully it would help you to do the configuration as well.

Install uv if you don’t have it already: curl -LsSf https://astral.sh/uv/install.sh | sh

Install sparkrun and start the wizard: uvx sparkrun setup

You’ll probably want to accept a lot of the defaults / say yes a lot – but you’ll have to give it the IP of your first node and the other node when it asks, e.g.: 127.0.0.1,192.168.44.21 where 127.0.0.1 means current system and 192.168.44.21is the ethernet IP of your new 2nd spark. It may ask you to type in the passwords as part of setup process, it doesn’t save them. (Example IPs written assuming you’re operating via Spark#1).

Then you can use existing “recipes” for sglang models from the preconfigured registries or make your own.

drew@spark-2918:~$ sparkrun list sglang
Name                           Runtime   TP   Nodes   GPU Mem   Model                        Registry               
--------------------------------------------------------------------------------------------------------------------
qwen3-1.7b-sglang              sglang    1    1       0.3       Qwen/Qwen3-1.7B              sparkrun-transitional  
qwen3-coder-next-fp8-sglang    sglang    2    2       0.8       Qwen/Qwen3-Coder-Next-FP8    sparkrun-transitional  
qwen3.5-0.8b-bf16-sglang       sglang    1    1       0.8       Qwen/Qwen3.5-0.8B            sparkrun-transitional  
qwen3.5-122b-a10b-fp8-sglang   sglang    2    2       0.8       Qwen/Qwen3.5-122B-A10B-FP8   sparkrun-transitional  
qwen3.5-27b-fp8-sglang         sglang    1    1       0.8       Qwen/Qwen3.5-27B-FP8         sparkrun-transitional  
qwen3.5-2b-bf16-sglang         sglang    1    1       0.8       Qwen/Qwen3.5-2B              sparkrun-transitional  
qwen3.5-35b-a3b-bf16-sglang    sglang    1    1       0.8       Qwen/Qwen3.5-35B-A3B         sparkrun-transitional  
qwen3.5-35b-a3b-fp8-sglang     sglang    1    1       0.8       Qwen/Qwen3.5-35B-A3B-FP8     sparkrun-transitional  
qwen3.5-4b-bf16-sglang         sglang    1    1       0.8       Qwen/Qwen3.5-4B              sparkrun-transitional  
qwen3.5-9b-bf16-sglang         sglang    1    1       0.8       Qwen/Qwen3.5-9B              sparkrun-transitional

The registries are all publicly available git repos. Recently, I’ve been working with @eugr and @raphael.amorim on Spark Arena, and since @eugr is the king of DGX Spark vLLM, I’ve been rather vLLM focused lately (i.e. in the past 1-2 weeks), but I do plan to come back to sglang containers and recipes.

You can run an existing recipe easily enough:

Run it with default settings: sparkrun run qwen3.5-35b-a3b-fp8-sglang

Override with tensor parallelism over nodes and reduce gpu memory utilization sparkrun run qwen3.5-35b-a3b-fp8-sglang --tp 2 --gpu-mem 0.5 – which should give you a nice speed boost leveraging both nodes (and I reduced the target memory utilization in this example to leave some more RAM open for other things)

You can view the recipe text with:
sparkrun export recipe qwen3.5-35b-a3b-fp8-sglang
–or–
save it a file with: sparkrun export recipe qwen3.5-35b-a3b-fp8-sglang --save my-recipe.yaml

Then you can edit the defaults to your preferences, save it, and run

sparkrun run ./my-recipe.yaml and it’ll not require you to override settings at CLI.

When you make your own recipes, you can also change the model, container basis, etc., so you can pretty much automate running whatever you want to run. Then you could publish your recipes to registries to manage them via git or to share them with others.

You could even install a model as a system service with sparkrun export systemd.

You also can run sparkrun on a linux/mac/(Windows via WSL) remote machine (not one of your sparks) and it can be used to manage/orchestrate your sparks.

There are fairly complete docs on the website (https://sparkrun.dev) so you can look stuff up there or chat on the forums about it: Sparkrun - central command with tab completion for launching inference on Spark Clusters - #48 by dbsci

Happy Sparking!

P.S. >> I forgot to mention there is also a claude code plugin. So once you’re setup, you can use the claude code plugin to check/start/stop inference jobs via claude code. More AI automation will be coming soon.

Topic		Replies	Views
New pre-built vLLM Docker Images for NVIDIA DGX Spark DGX Spark / GB10	73	6805	March 27, 2026
Build SGLang from source on Blackwell Pro 6000/ DGX Spark DGX Spark / GB10 jetson , nemotron	14	709	March 4, 2026
Setting up vLLM, SGLang or TensorRT on two DGX Sparks DGX Spark / GB10	16	1607	December 7, 2025
Running GLM-4.7-FP8 (355B MoE) on 4x DGX Spark with SGLang + EAGLE Speculative Decoding DGX Spark / GB10 Projects	36	1384	April 9, 2026
Run SGLang in Spark DGX Spark / GB10	20	2294	November 28, 2025
Running SGLang Diffusion Inference DGX Spark / GB10	3	250	January 27, 2026
HOW-TO: setup-dgx-spark docker inference - A "Sane" Inference Stack for GB10 (Need Contributors!) DGX Spark / GB10 Projects docker , llama , dgx	31	1357	April 7, 2026
Error When Validating Qwen3-32B-FP8 Performance Using sglang (Fp8LinearMethod AttributeError) DGX Spark / GB10 cuda	15	355	January 1, 2026
DGX Spark: 13 → 49 tok/s with Qwen3.5-35B — Native SM121 Kernel Build Guide DGX Spark / GB10 Projects cuda , cusparse	13	893	April 1, 2026
DGX Spark + Qwen3-Next-80B: Proven Performance, But Missing Clear Path to NIM, TensorRT-LLM & Web UIs DGX Spark / GB10 cuda , nim , llama	16	3709	March 6, 2026

New pre-built sglang Docker Images for NVIDIA DGX Spark

Related topics