I find the dense one very slow on gb10. but I am thinking about using an orchestrator to switch the second one between gemma4 dense, qwen3.6 dense and gemma4 moe. qwen3.6 moe serves very well for coding purposes.
gokhan.moral
11
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Moving from Mac to NVIDIA: bought powerful hardware, but drowning in configs | 37 | 2622 | February 25, 2026 | |
| Qwen3.5-122B-A10B on single Spark: up to 51 tok/s (v2.1 — patches + quick-start + benchmark) | 408 | 18264 | May 26, 2026 | |
| Step-3.5-Flash on Single Spark with 256k context | 2 | 720 | March 3, 2026 | |
| Qwen/Qwen3.6-35B-A3B (and FP8) has landed | 277 | 23508 | June 1, 2026 | |
| 50%+ Improvement on spark?! | 26 | 2253 | April 7, 2026 | |
| (sparkrun) Qwen3.5 GGUF Benchmarks over llama.cpp RPC | 3 | 719 | March 11, 2026 | |
| Llama.cpp experimental native mxfp4 support for blackwell PR | 13 | 1584 | January 7, 2026 | |
| Compiling llama.cpp | 14 | 2040 | February 7, 2026 | |
| Does Qwen3.5-35B-A3B on GB10 leave a lot of performance on the table? | 40 | 5703 | March 16, 2026 | |
| Tutorial: Build llama.cpp from source and run Qwen3 235B | 28 | 7360 | January 20, 2026 |