Hi everyone, Like many of you, I was incredibly excited to get my hands on the DGX Spark (GB10), but that excitement quickly turned into frustration when I realized how much time I was spending just on the “plumbing” - getting the drivers to play nice, configuring the container runtime for the arc…

@eugr , will you be adding it to spark-vllm-docker/recipes at main · eugr/spark-vllm-docker · GitHub ?

You sir are a legend! I ran the FP8 but whilst i’m still waiting for my second spark the context window of 256k was too large. However, I did manage to adjust this so I could finally use AWQ with: ``` #!/bin/bash docker run -d \ --name vllm-awq \\ --restart unless-stopped \\ --gpus all \\ --i…

glad it helped you out man! I’ve been using a simplified version as recommended by eugr and it also works fine. I haven’t tried the AWQ quant yet! Did you run into problems running FP8 256k on a single device? I did hit OOM problems with 0.90 GPU util but with 0.80 GPU util it stands up just fine

I think he is asking if qwen3 next coder will be a recipe in your repo anytime soon hehe I do see that the recipe is available at Spark Arena - LLM Leaderboard though (great contrib, thank you!)

ahh ok, i hadn’t tried the 0.80 setup, once again utilization defeats me. Will give it a go now! Thanks

[image] davidbarnesguildford: --ipc host I think I still don’t fully understand why everyone uses this. It is insecure and has stability consequences. What am I missing? @eugr My mistake, it is there: spark-vllm-docker/recipes/qwen3-coder-next-fp8.yaml at main · eugr/spark-vllm-docker · Git…

[image] jd36: Why not just go withe the official way Nvidia runs containers? ( Running vLLM - NVIDIA Docs - no ipc). The stack I posted runs without. This is why one defines shm-size also. Another side effect of that is that the Spark won’t crash during inference. It’s not required, just like …

Running with 0.90 is risky, as it spawns some processes to compile graphs which in most “normal” setups run on CPU RAM and don’t affect VRAM, but on Spark it’s all unified, so you get OOM more easily. I limited the number of threads recently, so it should crash less often, but I normally don’t go ab…

I’m working on adding an AI on the website to help you write recipes based on the latest spark-vllm-docker and vLLM options used in the latest supported container version.

HOW-TO: setup-dgx-spark docker inference - A "Sane" Inference Stack for GB10 (Need Contributors!)

Accelerated Computing DGX Spark / GB10 User Forum DGX Spark / GB10 Projects

martinB78 April 23, 2026, 6:51am 36

Topic		Replies	Views
Running a Full LLM Stack on DGX Spark GB10 (Your Application -> LiteLLM -> llama-swap -> vLLM / llama.cpp / Ollama) DGX Spark / GB10 Projects spark , jetson , llama , nemotron , openclaw	10	587	April 27, 2026
Managing Local LLM Orchestration DGX Spark / GB10 Projects	12	1401	April 23, 2026
DGX Spark + Qwen3-Next-80B: Proven Performance, But Missing Clear Path to NIM, TensorRT-LLM & Web UIs DGX Spark / GB10 cuda , nim , llama	16	4019	March 6, 2026
DGX Spark: The Sovereign AI Stack — Dual-Model Architecture for Local Inference DGX Spark / GB10 Projects docker , spark , llm	9	1641	February 13, 2026
DGX Spark performance DGX Spark / GB10	50	4346	February 27, 2026
New pre-built vLLM Docker Images for NVIDIA DGX Spark DGX Spark / GB10	73	7496	March 27, 2026
Moving from Mac to NVIDIA: bought powerful hardware, but drowning in configs DGX Spark / GB10 llama , nemotron	37	2327	February 25, 2026
New bleeding-edge vLLM Docker Image: avarok/vllm-nvfp4-gb10-sm120 DGX Spark / GB10 Projects	35	2882	December 31, 2025
Step-3.5-Flash on Single Spark with 256k context DGX Spark / GB10 Projects llama	2	549	March 3, 2026
GDX Spark is extremely slow on a short LLM test DGX Spark / GB10 deepseek	21	3699	January 25, 2026

HOW-TO: setup-dgx-spark docker inference - A "Sane" Inference Stack for GB10 (Need Contributors!)

Related topics