vLLM on GB10: gpt-oss-120b MXFP4 slower than SGLang/llama.cpp... what’s missing?

NVFP4/FP4 isn’t being “properly utilized” on DGX Spark (GB10 / sm121) in current vLLM builds, so NVFP4 quants can be slower than AWQ 4-bit on the same workload. FP4 kernels / NVFP4 paths are better optimized for sm120 (RTX 50xx / RTX Pro 6000) than for Spark’s sm121. So, the summary is: installs got way smoother (cu130 wheels + better Docker tooling + cluster scripts), but NVFP4 performance on Spark still isn’t quite there yet.