FYI. I released the images at New pre-built vLLM Docker Images for NVIDIA DGX Spark ; I don’t think GLM-4.7-Flash works on the latest image, but I was planning to release a new version tomorrow coinciding with pytorch 2.10 release. I don’t think vllm 0.14.0 release will work; I think a key commit came in right after the 0.14.0 release–but I might include it in the updated vllm 0.14.0 release image tomorrow given that GLM-4.7-Flash is pretty key…
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Help: Running NVFP4 model on 2x DGX Spark with vLLM + Ray (multi-node) | 18 | 1791 | December 25, 2025 | |
| Two-Spark cluster with vLLM using tensor-parallel-size 2 causes one node to drop while the other's GPU goes 100% forever | 36 | 716 | February 13, 2026 | |
| vLLM on GB10: gpt-oss-120b MXFP4 slower than SGLang/llama.cpp... what’s missing? | 143 | 4952 | February 24, 2026 | |
| New bleeding-edge vLLM Docker Image: avarok/vllm-nvfp4-gb10-sm120 | 35 | 1956 | December 31, 2025 | |
| We unlocked NVFP4 on the DGX Spark: 20% faster than AWQ! | 139 | 2571 | March 1, 2026 | |
| Make GLM-4.7-Flash go BRRRRR | 17 | 1523 | February 5, 2026 | |
| New pre-built vLLM Docker Images for NVIDIA DGX Spark | 48 | 3812 | February 13, 2026 | |
| Install and Use vLLM for Inference on two Sparks does not work | 159 | 4118 | December 9, 2025 | |
| NVIDIA folks -- where is this promised nvfp4 speedup? | 24 | 1576 | January 11, 2026 | |
| From 20 to 35 TPS on Qwen3-Next-NVFP4 w/ FlashInfer 12.1f | 10 | 1219 | January 7, 2026 |