Hey everyone! I just open-sourced my setup for running Qwen3.5-35B-A3B locally with llama.cpp and openclaw on the DGX Spark (GB10).
It took some digging to get everything working — the main pain points were role compatibility (developer/toolResult → what Qwen3.5 actually expects), chunked transfer encoding, and per-request thinking mode control.
Ended up writing a small proxy that handles all of it transparently. Tool calls, streaming, and the [think] keyword to toggle extended reasoning on demand — all working.
~43 tok/s generate / ~63 tok/s prefill on the GB10.
Full scripts to go from zero to working in one command:
Happy to answer questions if anyone’s trying to get a similar setup running.