I’m specifically suggesting implementation for the 3.5-35b-a3b and 27b models, which both are VL models and are extremely capable at their size. The current large model is powerful but clunky for most functions.
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Adding recipe support for OptimizeLLM/Qwen3-VL-30B-A3B-Thinking-NVFP4 | 6 | 218 | March 20, 2026 | |
| Implementation Guide: DGX Spark with Qwen3.5-35B-A3B via llama.cpp for Claude Code | 3 | 966 | April 2, 2026 | |
| Develop Native Multimodal Agents with Qwen3.5 VLM Using NVIDIA GPU-Accelerated Endpoints | 0 | 426 | February 27, 2026 | |
| Custom built vLLM + Qwen3.5-35B on NVIDIA DGX Spark (GB10) — sustained 50 tok/s, 1M context | 17 | 2743 | April 10, 2026 | |
| Qwen3.5-397b-a17b - All requests time out | 3 | 152 | March 9, 2026 | |
| Integrate and Deploy Tongyi Qwen3 Models into Production Applications with NVIDIA | 1 | 185 | May 5, 2025 | |
| Success with QuantTrio/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-AWQ | 1 | 1572 | April 2, 2026 | |
| Support new models | 1 | 95 | November 8, 2025 | |
| Playbook vLLM inference models naming/links issue | 1 | 176 | February 10, 2026 | |
| Qwen/Qwen3.5-122B-A10B - Alibaba/Qwen thought about us... :-D | 340 | 14749 | March 24, 2026 |