MiniMax M3 : NVFP4 for Quad DGX Spark

karol.spark · June 4, 2026, 2:05am

I think the sentence “Larger than M2 (200B model)” is simply pointing out that M2.x was around a 200B parameter model. The new model is multimodal, so I’d expect it to be at least twice the size of M2, but definitely under 1T parameters, since training a trillion-parameter model would be extremely costly.

jwarner · June 4, 2026, 3:07am

Step-3.7-Flash went multimodal compared to 3.5 with a simple 3.5GB mmproj.

It’s bigger but multimodality doesn’t require explosion in size.

corbett_korbett · June 4, 2026, 10:27pm

Lets hope it can comfortably fit on dual sparks. In my limited testing on the minimax agent website the model feels better than m2.7 but doesn’t feel 500b class with its intelligence.

mrtime · June 11, 2026, 1:58pm

corbett_korbett · June 11, 2026, 11:24pm

Supposed to be out tomorrow some time. But not VLLM PR so they could delay the model again like they did with M 2.7. Will see.

mrtime · June 11, 2026, 11:34pm

They already published some code GitHub - MiniMax-AI/MSA · GitHub

corbett_korbett · June 12, 2026, 12:17am

its got a spark PR as well Add SM12x MSA Support (Tested on DGX Spark, linux/arm64/SM121) by dbotwinick · Pull Request #1 · MiniMax-AI/MSA · GitHub

entrpi · June 12, 2026, 12:27am

Weights will be released sometime in the next day.

corbett_korbett · June 12, 2026, 12:30am

Really hope the license is good and not so vauge like the last one that making a single dollar off a output from M 3 from your small business falls under the no commercial use term. Most people running a model on $3k-$15k hardware are trying to get some benefit out of the models they run and not just “make me a GTA clone with a synthwave style” type prompts.

mrtime · June 12, 2026, 12:23pm

github.com/vllm-project/vllm

[Model] Add MiniMax M3 support (#45381)

main ← m3_release

opened 07:39AM - 12 Jun 26 UTC

youkaichao

+14741 -323

## Summary - Add MiniMax M3 model support across config, processors, model re…gistry, AMD/NVIDIA model implementations, MTP, sparse attention, and warmup paths. - Add MiniMax M3 reasoning and tool parsers, including Rust frontend registrations and Python-facing parser wrappers. - Add supporting kernels, quantization paths, router GEMM shape support, and targeted tests. ## Duplicate-work check - Open PR searches for `MiniMax M3` and `minimax_m3` found no duplicates. Broader `M3 model` results were unrelated. FIX https://github.com/vllm-project/vllm/issues/45360 ## Tests - `cargo fmt --manifest-path rust/Cargo.toml --all -- --check` - `cargo test --manifest-path rust/Cargo.toml -p vllm-reasoning-parser -p vllm-tool-parser -p vllm-chat` ## Notes - AI assistance was used to prepare this one-commit release branch and resolve conflicts against current main.

paxren2020 · June 12, 2026, 12:27pm

…

jwarner · June 12, 2026, 2:05pm

I’m sure it isn’t malicious, just avoiding scope creep so they can efficiently get day 0 support.

As soon as that PR is ported to vLLM we can just pull it in.

ekkis · June 12, 2026, 2:06pm

MiniMax-M3 is a native multimodal model with 1M context. It has ~428B parameters and ~23B activated parameters.

850gb so we’ll have to wait for quants, most likely won’t fit 2x cluster but plenty of room in a 4x

jwarner · June 12, 2026, 2:07pm

Chunky, 428B-A23B - I was wrong on the size, that’s going to be difficult to fit on 2x Sparks.

karol.spark · June 12, 2026, 2:09pm

I knew it! This confirms my initial expectations: the footprint is double that of the MiniMax 2.7 model. I am currently evaluating whether it can be successfully deployed across my two Sparks

paxren2020 · June 12, 2026, 2:13pm

It seems that the Qwen 397B, which is close in scale but slightly smaller, barely fits into a dual-Spark setup using intel INT4 quantization as it is. MiniMax is a bit larger, so it feels like it won’t fit into two Sparks at all.

maybe REAP version…

MiaAI_Lab · June 12, 2026, 2:16pm

Probably won’t fit 2x DGX Sparks. Perhaps NVFP4. But probably 3 should be the good fit.

karol.spark · June 12, 2026, 2:29pm

Manged to run Nex-N2-PRO which is based on Qwen 397B:

Nex-N2-Pro in W4A16 quant

Same model with NVFP4 quantizition:

and qwen 397B in NVFP4:

If W4A16 quantization becomes available, we will be able to run this model on a dual Spark setup

paxren2020 · June 12, 2026, 2:48pm

This Intel version barely fits as it is. MiniMax is roughly 8% larger. If you add 8% to the size you mentioned, it will exceed the Intel version’s footprint, meaning it won’t fit into the Sparks…

Though maybe it’ll turn out to be exactly the same size… and it will fit after all.))

karol.spark · June 12, 2026, 2:52pm

Our home.. our shield… our pride.. our history 🤍🤍🤍🤍🤍🤍

Let’s wait and see

Topic		Replies	Views
MiniMax M3 NVFP4 and NVFP4 REAP 50 for 4x & 2x DGX Sparks DGX Spark / GB10 Projects	53	3990	July 2, 2026
MiniMax-M3 (428B MoE + vision) at ~14–15 tok/s on 2× DGX Spark — EAGLE3 speculative decoding is the unlock DGX Spark / GB10 Projects llama , deepseek	2	638	July 4, 2026
Minimax3 on 2 nodes decode ~10.7 tok/s, 4bits DGX Spark / GB10 llama	26	1481	June 21, 2026
MiniMax M2.5 released (not available on HuggingFace as of now) -- is DGX Spark ready? DGX Spark / GB10	92	6797	April 12, 2026
MiniMax M2.7 NFVP4 Recipe & Benchmarks DGX Spark / GB10 llama	125	12891	July 9, 2026
MiniMax-M3-W4A16-GPTQ 2xGB10 Deployment (36 t/s) (fp8, nvfp4, KVarN, EAGLE-3) DGX Spark / GB10 agentic-ai	24	2488	July 22, 2026
MiniMax-M3-AWQ on 4× GB10, fp8 KV, 262k context, adaptive reasoning, ~30 tok/s DGX Spark / GB10 llama	17	1147	July 21, 2026
DGX Spark performance DGX Spark / GB10	49	6686	February 13, 2026
MiniMax 2.5 REAP - NVFP4 on single DGX Spark DGX Spark / GB10	25	3345	April 1, 2026
Best 2026 model for agentic work on a 2-node Spark cluster? DGX Spark / GB10 agentic-ai , deepseek , openclaw	33	5660	July 1, 2026

MiniMax M3 : NVFP4 for Quad DGX Spark

Related topics