Try this - Nemotron-3-Super-120B at 20-22 tok/s Super Special Recipe
deanc
218
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| We unlocked NVFP4 on the DGX Spark: 20% faster than AWQ! | 145 | 5917 | March 28, 2026 | |
| From 20 to 35 TPS on Qwen3-Next-NVFP4 w/ FlashInfer 12.1f | 10 | 1465 | January 7, 2026 | |
| Help: Running NVFP4 model on 2x DGX Spark with vLLM + Ray (multi-node) | 18 | 2188 | December 25, 2025 | |
| Two-Spark cluster with vLLM using tensor-parallel-size 2 causes one node to drop while the other's GPU goes 100% forever | 36 | 1077 | February 13, 2026 | |
| New bleeding-edge vLLM Docker Image: avarok/vllm-nvfp4-gb10-sm120 | 35 | 2668 | December 31, 2025 | |
| GLM-4.7-Flash-NVFP4 was just released, but for Transformers 5.0 + vLLM 0.14...? | 90 | 4107 | February 27, 2026 | |
| FP4 on DGX Spark — Why It Doesn't Scale Like You'd Expect | 214 | 4667 | March 27, 2026 | |
| NVIDIA folks -- where is this promised nvfp4 speedup? | 27 | 2339 | March 26, 2026 | |
| Qwen3-Next AWQ 4bit vs FP8 vs NVFP4 on single spark | 7 | 1465 | February 23, 2026 | |
| NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 | 89 | 7678 | March 31, 2026 |