Some blackwell optimizations coming for llama.cpp
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| DGX Spark, Nemotron3, and NVFP4: Getting to 65+ tps | 14 | 1701 | December 22, 2025 | |
| NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 | 89 | 7877 | March 31, 2026 | |
| Does Qwen3.5-35B-A3B on GB10 leave a lot of performance on the table? | 40 | 4309 | March 16, 2026 | |
| We unlocked NVFP4 on the DGX Spark: 20% faster than AWQ! | 145 | 6063 | March 28, 2026 | |
| NVIDIA folks -- where is this promised nvfp4 speedup? | 27 | 2373 | March 26, 2026 | |
| Step-3.5-Flash on Single Spark with 256k context | 2 | 459 | March 3, 2026 | |
| Custom built vLLM + Qwen3.5-35B on NVIDIA DGX Spark (GB10) — sustained 50 tok/s, 1M context | 15 | 2061 | April 8, 2026 | |
| Nemotron-3-Super-120B at 20-22 tok/s Super Special Recipe | 3 | 290 | April 5, 2026 | |
| Increasing artefact rate on growing context on DGX Spark (glm 4.7 flash) | 12 | 241 | February 4, 2026 | |
| Nemotron 3 Super: Updates Approaching Agentic Usability | 1 | 285 | April 5, 2026 |