Will you be looking into https://atlasinference.io/#models ? As its for Spark and RTX specifically and a lot smaller, things like call overhead could be more easily addressed / not become an issue :)
norman.2
396
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Qwen/Qwen3.5-122B-A10B - Alibaba/Qwen thought about us... :-D | 340 | 15618 | March 24, 2026 | |
| Qwen/Qwen3.6-35B-A3B (and FP8) has landed | 239 | 18731 | May 11, 2026 | |
| Does Qwen3.5-35B-A3B on GB10 leave a lot of performance on the table? | 40 | 5294 | March 16, 2026 | |
| Qwen3.5-35B-A3B optimizations on single Spark | 46 | 2298 | May 4, 2026 | |
| Qwen3.5-122B-A10B NVFP4 Quantized for DGX Spark — 234GB → 75GB, Runs on 128GB | 44 | 9889 | April 9, 2026 | |
| Qwen3.5-122B-A10B on single Spark: 15 → 21.5 tok/s with hybrid GPTQ-INT4 + FP8 dense layers (https://github.com/rmstxrx/vllm-hybrid-quant) | 9 | 713 | March 20, 2026 | |
| Qwen3.5-397B-A17B run in dual spark! but I have a concern | 230 | 7448 | May 11, 2026 | |
| HOW-TO: Run Qwen3-Coder-Next on Spark | 92 | 9233 | March 24, 2026 | |
| RedHatAI/Qwen3.5-122B-A10B-NVFP4 seems to be the best option for a single Spark | 75 | 5459 | May 4, 2026 | |
| Qwen3.5-397B-A17B + DGX Spark (duo) | 56 | 5082 | April 13, 2026 |