GLM-4.7-Flash-NVFP4 was just released, but for Transformers 5.0 + vLLM 0.14...?

dbsci · January 20, 2026, 8:01pm

FYI. I released the images at New pre-built vLLM Docker Images for NVIDIA DGX Spark ; I don’t think GLM-4.7-Flash works on the latest image, but I was planning to release a new version tomorrow coinciding with pytorch 2.10 release. I don’t think vllm 0.14.0 release will work; I think a key commit came in right after the 0.14.0 release–but I might include it in the updated vllm 0.14.0 release image tomorrow given that GLM-4.7-Flash is pretty key…

Topic		Replies	Views
Help: Running NVFP4 model on 2x DGX Spark with vLLM + Ray (multi-node) DGX Spark / GB10 mistral-large	18	1791	December 25, 2025
Two-Spark cluster with vLLM using tensor-parallel-size 2 causes one node to drop while the other's GPU goes 100% forever DGX Spark / GB10	36	716	February 13, 2026
vLLM on GB10: gpt-oss-120b MXFP4 slower than SGLang/llama.cpp... what’s missing? DGX Spark / GB10	143	4952	February 24, 2026
New bleeding-edge vLLM Docker Image: avarok/vllm-nvfp4-gb10-sm120 DGX Spark / GB10 Projects	35	1956	December 31, 2025
We unlocked NVFP4 on the DGX Spark: 20% faster than AWQ! DGX Spark / GB10	139	2571	March 1, 2026
Make GLM-4.7-Flash go BRRRRR DGX Spark / GB10	17	1523	February 5, 2026
New pre-built vLLM Docker Images for NVIDIA DGX Spark DGX Spark / GB10	48	3812	February 13, 2026
Install and Use vLLM for Inference on two Sparks does not work DGX Spark / GB10	159	4118	December 9, 2025
NVIDIA folks -- where is this promised nvfp4 speedup? DGX Spark / GB10	24	1576	January 11, 2026
From 20 to 35 TPS on Qwen3-Next-NVFP4 w/ FlashInfer 12.1f DGX Spark / GB10	10	1219	January 7, 2026

GLM-4.7-Flash-NVFP4 was just released, but for Transformers 5.0 + vLLM 0.14...?

Related topics