GLM 4.7 has been released yesterday, and you are able to run it on dual Spark cluster already. There is only one quant suitable for this setup currently available: Salyut1/GLM-4.7-NVFP4 · Hugging Face The problem is that due to the way it’s quantized, it uses a model config that is not compatible …

Thanks eugr for updating the community!

How to run GLM 4.7 on dual DGX Sparks with vLLM / mods support in spark-vllm-docker

Accelerated Computing DGX Spark / GB10 User Forum DGX Spark / GB10

raphael.amorim January 2, 2026, 5:13pm 28

Unfortunately, If you don’t use 2 sparks, you’re basically wasting $1500, because you’re not using the infiniband module you already have. Also training scales linearly with cluster expansion.

Topic		Replies	Views
GLM 4.6V works on Spark! DGX Spark / GB10 Projects	12	1828	January 22, 2026
DGX Spark performance DGX Spark / GB10	50	3093	February 27, 2026
Two-Spark cluster with vLLM using tensor-parallel-size 2 causes one node to drop while the other's GPU goes 100% forever DGX Spark / GB10	36	856	February 13, 2026
Running GLM-4.7-FP8 (355B MoE) on 4x DGX Spark with SGLang + EAGLE Speculative Decoding DGX Spark / GB10 Projects	32	1001	February 26, 2026
New bleeding-edge vLLM Docker Image: avarok/vllm-nvfp4-gb10-sm120 DGX Spark / GB10 Projects	35	2210	December 31, 2025
GLM-4.7-Flash-NVFP4 was just released, but for Transformers 5.0 + vLLM 0.14...? DGX Spark / GB10	90	3843	February 27, 2026
Help: Running NVFP4 model on 2x DGX Spark with vLLM + Ray (multi-node) DGX Spark / GB10 mistral-large	18	1949	December 25, 2025
Make GLM-4.7-Flash go BRRRRR DGX Spark / GB10	17	1714	February 5, 2026
Install and Use vLLM for Inference on two Sparks does not work DGX Spark / GB10	159	4424	December 9, 2025
Setting up vLLM, SGLang or TensorRT on two DGX Sparks DGX Spark / GB10	16	1356	December 7, 2025

How to run GLM 4.7 on dual DGX Sparks with vLLM / mods support in spark-vllm-docker

Related topics