š New vLLM Docker Images for NVIDIA DGX Spark
Quickly publishing initial vLLM Docker images optimized for NVIDIA DGX Spark (Blackwell-ready, NCCL + PyTorch rebuilt).
Available images (so far):
scitrera/dgx-spark-vllm:0.13.0-t4ā vLLM 0.13.0, PyTorch 2.9.1, CUDA 13.0.2, Transformers 4.57.5, Triton 3.5.1, NCCL 2.28.9-1scitrera/dgx-spark-vllm:0.14.0rc2-t4ā vLLM 0.14.0rc2, PyTorch 2.10.0-rc6, CUDA 13.1.0, Transformers 4.57.5, Triton 3.5.1, NCCL 2.28.9-1scitrera/dgx-spark-vllm:0.14.0rc2-t5ā vLLM 0.14.0rc2, PyTorch 2.10.0-rc6, CUDA 13.1.0, Transformers 5.0.0rc3, Triton 3.5.1, NCCL 2.28.9-1
Both images include Ray for multi-node / cluster deployments. Will be adding transformers 5 variants soon to enable use for GLM-4.6V.
Example usage
docker run \
--privileged \
--gpus all \
-it --rm \
--network host --ipc=host \
-v ~/.cache/huggingface:/root/.cache/huggingface \
scitrera/dgx-spark-vllm:0.13.0-t4 \
vllm serve \
Qwen/Qwen3-1.7B \
--gpu-memory-utilization 0.7
Tag semantics
-
-t4ā Transformers 4.x- Example:
0.13.0-t4= vLLM 0.13.0 + Transformers 4.57.5
- Example:
-
-t5ā Transformers 5.x (pre-release)
Inspecting package versions
Major component versions are embedded as Docker labels:
docker inspect scitrera/dgx-spark-vllm:0.14.0rc2-t4 \
--format '{{json .Config.Labels}}' | jq
Example output:
{
"dev.scitrera.cuda_version": "13.1.0",
"dev.scitrera.flashinfer_version": "0.6.1",
"dev.scitrera.nccl_version": "2.28.9-1",
"dev.scitrera.torch_version": "2.10.0-rc6",
"dev.scitrera.transformers_version": "4.57.5",
"dev.scitrera.triton_version": "3.5.1",
"dev.scitrera.vllm_version": "0.14.0rc2"
}
Notes
- Updated NCCL (versus PyTorch 2.9.1)
- PyTorch, Triton, and vLLM are rebuilt accordingly
- These images are early / experimental
For faster iteration on vLLM, Iād recommend @eugrās repo:
š https://github.com/eugr/spark-vllm-docker.
long-term maintenance, support, and feedback plans are still TBD