Guide: llama.cpp + Qwen3.5-35B-A3B + openclaw on GB10

user99333 · March 1, 2026, 9:46am

Hey everyone! I just open-sourced my setup for running Qwen3.5-35B-A3B locally with llama.cpp and openclaw on the DGX Spark (GB10).
It took some digging to get everything working — the main pain points were role compatibility (developer/toolResult → what Qwen3.5 actually expects), chunked transfer encoding, and per-request thinking mode control.
Ended up writing a small proxy that handles all of it transparently. Tool calls, streaming, and the [think] keyword to toggle extended reasoning on demand — all working.
~43 tok/s generate / ~63 tok/s prefill on the GB10.
Full scripts to go from zero to working in one command:

Happy to answer questions if anyone’s trying to get a similar setup running.

pducharme · March 1, 2026, 7:35pm

Hi ! I have my OpenClaw running on a remote VPS, connected to tailscale and my DGX Spark Connected also to tailscale. Both can ‘see’ each-other and resolved to dgx-spark. Can I just point it to this instead of the 127.0.0.1 ?

acolotto · March 2, 2026, 6:15am

This is fantastic! Thank you so much for this. I just have one question, why have you set the context to ~128k when the supported is basically 256k?

vmm1234 · March 4, 2026, 6:05am

Thanks. The proxy solved my questions on why openclaw doesn’t call Qwen3.5 appropriately .

aniculescu · March 4, 2026, 3:49pm

Thanks for the guide. I will move this to GB10 Projects

Topic		Replies	Views
Implementation Guide: DGX Spark with Qwen3.5-35B-A3B via llama.cpp for Claude Code DGX Spark / GB10 Projects llama , agentic-ai	3	429	April 2, 2026
Total nightmare : NEMOCLAW over Paperclip over OPENCLAW over vLLM over Dokers, over LLM flavours , over Linux DGX Spark / GB10	14	2374	March 25, 2026
Tutorial: Build llama.cpp from source and run Qwen3 235B DGX Spark / GB10 Projects llama	28	5896	January 20, 2026
Qwen3.5-35B-A3B on NVIDIA DGX Spark DGX Spark / GB10	6	2519	March 17, 2026
Does Qwen3.5-35B-A3B on GB10 leave a lot of performance on the table? DGX Spark / GB10 agentic-ai	40	4186	March 16, 2026
Custom built vLLM + Qwen3.5-35B on NVIDIA DGX Spark (GB10) — sustained 50 tok/s, 1M context DGX Spark / GB10	14	1782	April 6, 2026
Step-3.5-Flash on Single Spark with 256k context DGX Spark / GB10 Projects llama	2	453	March 3, 2026
Qwen/Qwen3.5-122B-A10B - Alibaba/Qwen thought about us... :-D DGX Spark / GB10	340	13995	March 24, 2026
Some new development work for Qwen3 on the Spark DGX Spark / GB10	5	689	February 3, 2026
vLLM Compatibility Problem with GPT OSS 120B and OpenClaw by spark-vllm-docker DGX Spark / GB10 cuda	21	2000	March 16, 2026

Guide: llama.cpp + Qwen3.5-35B-A3B + openclaw on GB10

Related topics