I’m not that good with vLLM, but I want to understand if it’s possible to deploy DeepSeek-V4-Pro ( nvidia/DeepSeek-V4-Pro-NVFP4 ) across all 8 of my nodes?
I believe that model quant is only about 850 GiB total compared to the 976 GiB of usable memory that you have access to, so it should work fine? But I would recommend looking into GLM-5.2. By all appearances, it is a substantial step up from DSV4 Pro, and it would also fit in your 8xSpark cluster.
@pakasio, yes, it’s possible, but getting it running currently requires using versions of software outside of the primary sources, which is discussed in some of the threads on DeepSeek-V4 here. An 8x cluster can run current top open-source models like: DeepSeek-V4-Pro, Kimi-K2.6, Kimi-K2.7-Code, MiniMax-M3, and GLM-5.2-FP8. These models are close to the current state-of-the-art closed-source models, but have the added benefit of never getting reduced to a lower-quality quant like the proprietary cloud models do when they’re under heavy load, so it’s a very nice capability for the price.