Yes, but I’ve had some weird issues with this quant, while QuantTrio worked fine. I don’t remember what was the issue, it should be somewhere in the forum threads :)
EDIT: found it - How to run GLM 4.7 on dual DGX Sparks with vLLM / mods support in spark-vllm-docker - #5 by eugr
Not sure if that issue still persists, but QuantTrio version is smaller in size and I can fit 128K context in my two sparks with fp8 KV cache.