What's the biggest LLM you've been able to run on a Cluster of DGX Sparks with a large context window (128k and up)?

Danny-zts · February 26, 2026, 2:37pm

First of all, hello to every one, I’m Danny, new to the forum and excited to be here! I’m thinking of purchasing 2 DGX Sparks to run in a cluster… Primary use case is running my Open Claw build’s brain and vibe coding web and mobile apps. My thought process is, host the best LLM I can possibly run for openclaw and vibe coding should provide the best output… I know that might not be accurate but I’m also thinking I want this thing to be somewhat future proof atleast for a few years and as the LLMs grow, with a cluster I should be ok to run the majority of the LLMs I’d want. Ofcourse being able to throw in a 3rd or 4th Spark into the cluster later is also an option for the future if needed.

So with all that being said…. Please chime in on whether a cluster would be overkill or not for my example use case. What is the largest LLM you have been able to run successfully on a cluster… and.. I’m also thinking, these things are probably going to go up in price soon given NVIDIA just increased it on their end and the Asus Ascent is still going for $3,000 for the 1TB version… Something tells me that price is about to go up soon.

raphael.amorim · February 26, 2026, 4:01pm

We’ve been sharing our benchmarks here: https://spark-arena.com

Favorites for dual not have been MiniMax-M2.5 with 229B and QuantTrio/Qwen3-VL-235B-A22B-Instruct-AWQ.
You can also run openai/gpt-oss-120b, Qwen/Qwen3.5-122B-A10B and Qwen/Qwen3-Coder-Next-FP8 at really good performance on a cluster.
You could run unsloth/Qwen3.5-397B-A17B-GGUF:Q3_K_M, but at 3-bit quantization you’ll get better results with the first 2.

cosinus · February 26, 2026, 4:06pm

As for the prices:

As for the “biggest LLM” - may be you check out first what you might get when running different LLMs over here:

using the famous eugr vllm tools (makes running them in a cluster much easier).

And if you want to go for some more speed (not always) have a look over here:

llama.cpp is handy for single spark use. For agentic use vLLM should be better.

And if you have too much money or a lot of YouTube subscribers:

AFAIR Alex did run Kimi K2 and Qwen3.5 397B - just need 8 Sparks.

jwarner · February 26, 2026, 5:26pm

Try Step 3.5 Flash. 196B parameters, 10b active, highly efficient attention mechanism. There are two quants which work great on the Spark - the official Q4_K_S and the IQ4_XS.

The official one is stable limited to ~190k context, getting a bit over 20 t/s decreasing to about 8 near context limit. The other is a little bit smaller so you can have 256k context, but actually better in perplexity.

The IQ4_XS from ubergarm mixes precision and thus isn’t quite as fast as the main one - 17 t/s up to around 80k context and then dives to around 3-4 at 256k (but stable). It also has a broken jinja template you have to fix for tool calling.

Several other releases in the last two weeks overshadowed Step3.5’s release, but it is awesome on 1x Spark. I’m planning to make a post about this model, but want to try a new Autoround quant with vLLM before I do.

Topic		Replies	Views
Now running 2x DGX Spark stacked over QSFP56 looking for model recs for agentic workloads (Hermes / OpenClaw) DGX Spark / GB10 Projects agentic-ai , deepseek , openclaw	27	1783	May 12, 2026
Distributed Inference - 200gb/s with bottleneck, am I missing something? DGX Spark / GB10 llama	5	597	January 22, 2026
Anyone have any luck running MiniMaxAi/MiniMax-M2 on a cluster DGX Spark? DGX Spark / GB10	9	1355	December 14, 2025
My Dual Sparks setup plan DGX Spark / GB10 agentic-ai , nemoclaw , openclaw	8	560	April 8, 2026
100b+ parameter LLM list DGX Spark / GB10 llm , llama	5	1121	January 24, 2026
Oops.. pressed the button for 2x GB10... no spousal approval, am I in trouble? DGX Spark / GB10 llama	14	573	May 14, 2026
Distributed Spark DGX Spark / GB10 llama	2	192	March 10, 2026
Thoughts out loud! DGX Spark 2 or 4 or 8 DGX Spark / GB10	10	621	May 11, 2026
Best practices for running llvm bench DGX Spark / GB10	2	172	December 21, 2025
DGX Spark performance DGX Spark / GB10	50	5184	February 27, 2026

What's the biggest LLM you've been able to run on a Cluster of DGX Sparks with a large context window (128k and up)?

Related topics