Guide: llama.cpp + Qwen3.5-35B-A3B + openclaw on GB10

Hey everyone! I just open-sourced my setup for running Qwen3.5-35B-A3B locally with llama.cpp and openclaw on the DGX Spark (GB10).
It took some digging to get everything working — the main pain points were role compatibility (developer/toolResult → what Qwen3.5 actually expects), chunked transfer encoding, and per-request thinking mode control.
Ended up writing a small proxy that handles all of it transparently. Tool calls, streaming, and the [think] keyword to toggle extended reasoning on demand — all working.
~43 tok/s generate / ~63 tok/s prefill on the GB10.
Full scripts to go from zero to working in one command:

Happy to answer questions if anyone’s trying to get a similar setup running.

9 Likes

Hi ! I have my OpenClaw running on a remote VPS, connected to tailscale and my DGX Spark Connected also to tailscale. Both can ‘see’ each-other and resolved to dgx-spark. Can I just point it to this instead of the 127.0.0.1 ?

This is fantastic! Thank you so much for this. I just have one question, why have you set the context to ~128k when the supported is basically 256k?

Thanks. The proxy solved my questions on why openclaw doesn’t call Qwen3.5 appropriately .

Thanks for the guide. I will move this to GB10 Projects