M3 Ultra 512 GB sold out due to latest GLM 5.1 benchmarks being so close to Claude 4.6 Opus, so... '4 Sparks are all you need'?

notmy.reward438 · April 11, 2026, 2:24pm

Maybe some of you have seen those videos from Chinese social media where people with shelves of Apple minis/studios/etc. to run AI since Nvidia was banned (on different levels at different times). I’m just wondering what the experience is like compared to our beloved Sparkies, including the software side.

Can’t seem to buy the 512 GB model anymore even in the US because they’re sold out. Though performance is a bit different, but at least on a capacity level, that would be equivalent to 4 DGX Sparks to run a 4 bit quant GLM 5.1.

So it piqued my interest and got me wondering how the developer/user experience is to put up vLLM on those machines compared to the DGX Spark. What’s lacking in the Apple machines? What’s lacking in DGX Sparks (besides how bad the state of NVFP4 is)? Anyone experience with both to compare?

Also, how is running multiple Spark experience? I see a lot of posts mentioning eugr’s solution, but just wondering if there are other existing solutions also.

Zambonilli · April 11, 2026, 3:59pm

A lot of the value of the Spark DGX is that you have the full nvidia stack up and down on the latest chipset. On macOS you have to contend with almost every single deep learning model was built on nividia and MPS is an afterthought at best. Expect a lot of models to have no support and fallback to CPU, memory leaks and incorrect or different processing results due to branching in common libraries.

However, the reviews on the new m5 for tensor processing and bandwidth have been really good. Combined with Intel’s latest GPU release and PS6 saying they will have RNN support for the game loop native and most likely on an AMD APU, there might be a bit more thought put into cross compatibility in coming years.

stefan132 · April 11, 2026, 4:19pm

Inferencing software for Macos is not really production ready, i.e. nearly no parallel processing support. Nice for home use. Bad for small prod environments. And prompt processing on Apple is still bad. 512 GB is great, but models of that size are too heavy for the GPU anyhow. Except you can wait or do nightly batch processing…but even then it might just take too long which in turn increases energy consumption.

Teason2026 · April 11, 2026, 4:23pm

there is vLLM mlx (apple silicon).

You can run right now on 2-4 sparks glm-4.7 (395b) in nvfp4 (2 sparks, lots of efforts) or fp8 on 4 sparks, but not any of newer space attention models (glm-5.x, newer deepseek v3.x) as there is no such attention implementation for sm121/gb10 or even sm120/rtx pro 6000 (96gb vram).

on m3 ultra, token generation for 35b active parameters models, like in glm-4.7 395b-a35b ~10 tokens per second.
on 4x sparks running in tp=4 same 10tokens per second.

so, you get more for you money with m3 ultra 512 gb ($10k for 10tps, vs min $13.6k for sparks), but apple dont do any more m3 ultra 512gb, maybe it will be m5 ultra 512gb, but no one knows new prices.

Topic		Replies	Views
DGX Spark cores to that of Apple macbook M4 cores DGX Spark / GB10 cuda	13	4733	October 15, 2025
What am I waiting for? DGX Spark / GB10	13	3937	October 4, 2025
Reviews are coming in DGX Spark / GB10	27	6920	November 24, 2025
I am EXTREMely disappointed with the current state of DGX Spark DGX Spark / GB10	55	4501	April 12, 2026
GLM 5.1 on Hugging Face... Is this model going to run on a Single Spark? How many will be necessary? DGX Spark / GB10	17	2857	April 12, 2026
My first DGX arrives TODAY!... be gentle on me please DGX Spark / GB10	0	41	March 12, 2026
DGX Spark vs AMD Strix Halo DGX Spark / GB10 llama	4	5727	February 18, 2026
DGX Spark + RTX 3090 (any other GPU) --> DGX Spark Mini Station (DGX Sprak + (e)dGPU) DGX Spark / GB10 cuda , gaming , llama	6	600	April 14, 2026
DGX Spark by far the best inference (at the edge) option? DGX Spark / GB10 edgeai	2	682	January 21, 2026
Investigating Performance Issue/Bottleneck DGX Spark / GB10 llama , agentic-ai	9	616	February 1, 2026

M3 Ultra 512 GB sold out due to latest GLM 5.1 benchmarks being so close to Claude 4.6 Opus, so... '4 Sparks are all you need'?

Related topics