PSA: State of FP4/NVFP4 Support for DGX Spark in VLLM

I found the root case:

SM120/SM121 (DGX Spark, RTX 50) has only 99KB SMEM vs 228KB on SM100. The K=128 block-scaled MoE GEMM tiles compile but overflow SMEM at runtime on SM120. And K=64 tiles that would fit can’t compile yet due to two unfixed CUTLASS bugs.

So the real problem isn’t instruction incompatibility, it’s that SM120 has only 99KB SMEM (vs 228KB on SM100), and the K=128 block-scaled MoE GEMM tiles overflow it at runtime.

5 Likes