MiniMax M3 : NVFP4 for Quad DGX Spark

I think the sentence “Larger than M2 (200B model)” is simply pointing out that M2.x was around a 200B parameter model. The new model is multimodal, so I’d expect it to be at least twice the size of M2, but definitely under 1T parameters, since training a trillion-parameter model would be extremely costly.

Step-3.7-Flash went multimodal compared to 3.5 with a simple 3.5GB mmproj.

It’s bigger but multimodality doesn’t require explosion in size.

Lets hope it can comfortably fit on dual sparks. In my limited testing on the minimax agent website the model feels better than m2.7 but doesn’t feel 500b class with its intelligence.