Experimental but hopefully useful release: sparkrun!
Run everything (vllm + sglang +llama.cpp); solo + cluster; get VRAM estimates; easy to distribute and share recipes!
Installation
# uv is preferred mechanism for managing python environments
# To install uv:
curl -LsSf https://astral.sh/uv/install.sh | sh
# automatic installation via uvx (manages virtual environment and
# creates alias in your shell, sets up autocomplete too!)
uvx sparkrun setup install
WITH TAB COMPLETION SUPPORT FOR RECIPES AND OPTIONS ;-)
Create a Cluster
# Save your hosts once; you can have multiple named clusters; can be self with 127.0.0.1 host
sparkrun cluster create mylab --hosts 192.168.11.13,192.168.11.14 -d "My DGX Spark lab"
sparkrun cluster set-default mylab
Run a model
# Run Qwen3-1.7b-sglang
sparkrun run qwen3-1.7b-sglang
# Run Qwen3-1.7b-vllm
sparkrun run qwen3-1.7b-vllm
Models come from recipes, compatible with GitHub - eugr/spark-vllm-docker: Docker configuration for running VLLM on dual DGX Sparks recipes. (And in fact, part of the idea is that this is more of a generic launcher also to align with @eugr and @raphael.amorim 's spark-area.com direction. In fact, this launcher, when used with recipes from eugr repo – will run eugr’s scripts directly. Otherwise, one can use vllm or sglang with other images (mine, NVIDIA, your own, etc.)
VRAM Estimates!
$ sparkrun show qwen3-1.7b-sglang
Name: qwen3-1.7b-sglang
Description: Qwen3 1.7B -- small test model, solo or cluster (SGLang)
Maintainer: scitrera.ai <open-source-team@scitrera.com>
Runtime: sglang
Model: Qwen/Qwen3-1.7B
Container: scitrera/dgx-spark-sglang:0.5.8-t5
Nodes: 1 - unlimited
Repository: Local
File Path: /home/drew/oss-sparkrun/recipes/qwen3-1.7b-sglang.yaml
Defaults:
gpu_memory_utilization: 0.3
host: 0.0.0.0
port: 8000
served_model_name: qwen3-1.7b
tensor_parallel: 1
Command:
python3 -m sglang.launch_server \
--model-path {model} \
--served-model-name {served_model_name} \
--mem-fraction-static {gpu_memory_utilization} \
--tp-size {tensor_parallel} \
--host {host} \
--port {port} \
--reasoning-parser deepseek-r1 \
--trust-remote-code
VRAM Estimation:
Model dtype: bf16
Model params: 1,700,000,000
KV cache dtype: bfloat16
Architecture: 28 layers, 8 KV heads, 128 head_dim
Model weights: 3.17 GB
Tensor parallel: 1
Per-GPU total: 3.17 GB
DGX Spark fit: YES
GPU Memory Budget:
gpu_memory_utilization: 30%
Usable GPU memory: 36.3 GB (121 GB x 30%)
Available for KV: 33.1 GB
Max context tokens: 310,205
Supports multiple registries – you can add your own or use local recipes
drew@spark-840b:~$ sparkrun list
Name Runtime Registry File
---------------------------------------------------------------------------------------------------------
nemotron3-nano-30b-nvfp4-vllm vllm sparkrun-official nemotron3-nano-30b-nvfp4-vllm
nemotron3-nano-30b-vllm vllm sparkrun-official nemotron3-nano-30b-vllm
qwen3-1.7b-sglang sglang sparkrun-official qwen3-1.7b-sglang
qwen3-1.7b-vllm vllm sparkrun-official qwen3-1.7b-vllm
qwen3-coder-next-fp8-sglang-cluster sglang sparkrun-official qwen3-coder-next-fp8-sglang-cluster
GLM-4.7-Flash-AWQ eugr-vllm eugr-vllm glm-4.7-flash-awq
MiniMax-M2-AWQ eugr-vllm eugr-vllm minimax-m2-awq
MiniMax-M2.5-AWQ eugr-vllm eugr-vllm minimax-m2.5-awq
Nemotron-3-Nano-NVFP4 eugr-vllm eugr-vllm nemotron-3-nano-nvfp4
OpenAI GPT-OSS 120B eugr-vllm eugr-vllm openai-gpt-oss-120b
Qwen3-Coder-Next-FP8 eugr-vllm eugr-vllm qwen3-coder-next-fp8
# eugr-vllm gets recipes from https://github.com/eugr/spark-vllm-docker AND basically
# just passes through to scripts from there; the purpose of sparkrun is to be a unifying interface for running jobs and it would be woefully incomplete if it didn't include eugr's repo
drew@spark-840b:~$ sparkrun recipe --help
Usage: sparkrun recipe [OPTIONS] COMMAND [ARGS]...
Manage recipe registries and search for recipes.
Options:
--help Show this message and exit.
Commands:
add-registry Add a new recipe registry.
list List available recipes from all registries.
registries List configured recipe registries.
remove-registry Remove a recipe registry.
search Search for recipes by name, model, or description.
show Show detailed recipe information.
update Update recipe registries from git.
validate Validate a recipe file.
vram Estimate VRAM usage for a recipe on DGX Spark.