Qwen3.5-35B-A3B on NVIDIA DGX Spark

fixadvicedevices · February 25, 2026, 8:57pm

I just installed Qwen3.5-35B-A3B on NVIDIA DGX Spark, and I am really imprresed, i run some tests and until now is the first LLM that I can really use for internal RAG. Documentation and tests done by me here: GitHub - adadrag/qwen3.5-dgx-spark: Complete guide to running Qwen3.5-35B-A3B on NVIDIA DGX Spark (GB10) with vLLM - installation, benchmarks, vision features, and troubleshooting · GitHub .

Please let me know your thoughts

relc · February 26, 2026, 4:53pm

I like the sheep part. Good work.

ehfortin · March 1, 2026, 11:29pm

Hello,

When you get 30 t/s, is it with the configuration shown in step 3? If so, how do you measure it? Is it including the thinking phase + the actual answer? For example, if I ask “Can you describe the sun in a few short sentences”, it took 16 seconds thinking and then write a two lines answer in 3 seconds. It is basically one line per second.

Let me know. Thank you.

ehfortin

eric331 · March 5, 2026, 2:36am

Thank you, super helpful! Was working on trying to manually update nvcr.io/nvidia/vllm:26.02-py3 but kept hitting walls, so thanks again. Takes about 4 minutes to launch on my system.

Looks like the nightly build of vllm/vllm-openai:cu130-nightly is running cuda 13.0.1 but the nvidia vllm image is running CUDA 13.1.1. Any concerns or improvements there?

Your docker run does not have “–kv-cache-dtype fp8”, which I’ve seen others recommend for memory savings, any thoughts or suggestions there?
thank you

saikanov · March 17, 2026, 9:19am

hey maybe dump question, want to ask before downloading 50+gb model, is this work for Qwen3.5 27b too?

Topic		Replies	Views
Custom built vLLM + Qwen3.5-35B on NVIDIA DGX Spark (GB10) — sustained 50 tok/s, 1M context DGX Spark / GB10	18	4308	May 7, 2026
Can someone please just help me set the DGX Spark up for optimal LLM use? DGX Spark / GB10 llama	11	1640	June 20, 2026
Question on Inference Performance Results of Qwen3 235B A22B on 2× DGX Spark DGX Spark / GB10 cuda	5	818	December 19, 2025
Qwen3.5-397B-A17B + DGX Spark (duo) DGX Spark / GB10 Projects	62	6645	June 14, 2026
Does Qwen3.5-35B-A3B on GB10 leave a lot of performance on the table? DGX Spark / GB10 agentic-ai	40	6498	March 16, 2026
DGX Spark + Qwen3-Next-80B: Proven Performance, But Missing Clear Path to NIM, TensorRT-LLM & Web UIs DGX Spark / GB10 cuda , nim , llama	16	5091	March 6, 2026
Qwen3.5-122B-A10B NVFP4 Quantized for DGX Spark — 234GB → 75GB, Runs on 128GB DGX Spark / GB10 Projects	44	12330	April 9, 2026
Implementation Guide: DGX Spark with Qwen3.5-35B-A3B via llama.cpp for Claude Code DGX Spark / GB10 Projects llama , agentic-ai	3	2074	April 2, 2026
Benchmark Report: Qwen3.6-35B-A3B-NVFP4 on NVIDIA DGX Spark, Jetson Thor, Blackwell 6000 Pro DGX Spark / GB10 Projects	11	4421	July 14, 2026
Some new development work for Qwen3 on the Spark DGX Spark / GB10	5	863	February 3, 2026

Qwen3.5-35B-A3B on NVIDIA DGX Spark

Related topics