DGX Spark + RTX 6000 Pro Blackwell — disaggregated inference

goosetroop · May 4, 2026, 12:36am

EXO Labs showed the pattern works: 2× DGX Spark for prefill + M3 Ultra Mac Studio for decode → ~2.8× e2e on Llama-3.1 8B / 8K prompt over plain 10 GbE. Spark is compute-strong / bandwidth-weak; Mac Studio’s wide memory bus eats decode.

RTX 6000 Pro Blackwell looks like a strictly stronger decode partner than the Mac Studio. On paper:

Memory bandwidth: ~1.79 TB/s vs M3 Ultra ~800 GB/s
Capacity: 96 GB GDDR7 — fits 70B-class FP8, headroom for KV at long context
Same silicon family as Spark (Blackwell) — NVFP4 native on both sides, no MLX<->CUDA boundary
Interconnect: Spark’s ConnectX-7 200 Gbps QSFP straight into the 6000 Pro host, vs EXO’s 10 GbE
Software stack: all CUDA — vLLM / SGLang already do PD-disagg on H100 + GB200, should drop down

So the question: has anyone actually wired Spark prefill → RTX 6000 Pro decode?

eparin82 · May 4, 2026, 12:46am

I think I saw a comment on this board saying that no one has actually reproduced this setup yet

jwarner · May 4, 2026, 2:04am

There is a project - more of a proof of concept though - called speculative speculative decoding which supports a drafter located in a separate machine. You need high bandwidth between units to make it work.

It would be really interesting if the Spark supported an eGPU, but none of the ports are Thunderbolt.

A ConnectX-7 in the RTX PRO 6000 machine would be required, and you’d be working off the edge of the map based on a research proof of concept.

raphael.amorim · May 4, 2026, 4:14am

Yes, it’s documented here: Distributed inference cluster: DGX Spark – RTX 6000 Pro – DevQuasar

goosetroop · May 4, 2026, 11:37am

So DevQuasar was using TP not EXO’s DP..

DP would allow Sparks to aggregate the prefill (compute-bound — Sparks have a ton of FLOPs collectively), 6000 Pro handles decode alone (bandwidth-bound — plays to its 1.79 TB/s).

jc2375 · May 4, 2026, 4:29pm

I posted that. And, to be fair, one guy with a youtube channel did it very recently, but its not straightforward. The guy who showed it working is in this forum as well (Alex @alexander.ziskind). They (Exo) certainly have never released it, although they may in the future.
It seems that the network latency is an issue because macs dont have PCIE NICs. I’m not sure how Alex got around that with the thunderbolt-mellanox enclosures, but maybe the latency is ok at high enough bandwidth – he seemed to do ok with 40GBE. I have those cards laying around, and thunderbolt enclosures (albeit TB3/4), and a mac studio (M2 Ultra) so I am looking to try this, but so far its a theoretical with few if any true examples and I am not even sure if you can do it without higher speed thunderbolt (he used an M3 ultra which carries TB5 ports)

pfnguyen · May 4, 2026, 5:50pm

Seems odd, doesn’t the RTX6000 have way more compute than 2x spark? (but even so, you need to do 1x spark to 1x rtx6000, wouldn’t you?). It doesnt feel like you’d get any performance uplift at all.

co-le · May 4, 2026, 7:51pm

I benchmarked 2x RTX 6000 vs 2x Spark on MiniMax M2.7 AWQ-4bit: GPU Benchmark Comparison

Topic		Replies	Views
Dedicated decode companion for DGX Spark over QSFP DGX Spark / GB10	1	170	April 2, 2026
Could exo be something useful for a spark cluster DGX Spark / GB10 Projects	1	426	February 14, 2026
DGX Spark by far the best inference (at the edge) option? DGX Spark / GB10 edgeai	2	787	January 21, 2026
Benchmarking VLMs on the DGX Spark DGX Spark / GB10	6	3145	October 14, 2025
Best Inference Framework & Open Models for Orchestrator-Workers Agentic Coding on GB10 + 5090 Hybrid? DGX Spark / GB10 llama , agentic-ai , deepseek	1	531	February 19, 2026
Why 273 GB/s? Less Is More, Until It Isn’t DGX Spark / GB10	67	2271	March 27, 2026
DGX Spark + RTX 3090 (any other GPU) --> DGX Spark Mini Station (DGX Sprak + (e)dGPU) DGX Spark / GB10 cuda , gaming , llama	6	789	April 14, 2026
Enabling GPU Direct RDMA for DGX Spark Clustering DGX Spark / GB10 gpu	12	1708	December 25, 2025
Adding node performance DGX Spark / GB10	6	511	December 29, 2025
DGX sm12.1 + PC with 2x 5090 SM12.0 to Mikrotik 400Git/s cluster COMPATIBLE RDMA and NCCL? DGX Spark / GB10	4	248	April 30, 2026

DGX Spark + RTX 6000 Pro Blackwell — disaggregated inference

Related topics