What's the biggest LLM you've been able to run on a Cluster of DGX Sparks with a large context window (128k and up)?

As for the prices:

As for the “biggest LLM” - may be you check out first what you might get when running different LLMs over here:

using the famous eugr vllm tools (makes running them in a cluster much easier).

And if you want to go for some more speed (not always) have a look over here:

llama.cpp is handy for single spark use. For agentic use vLLM should be better.

And if you have too much money or a lot of YouTube subscribers:

AFAIR Alex did run Kimi K2 and Qwen3.5 397B - just need 8 Sparks.

2 Likes