How many DGX Spark/GB10 devices do you have?

Seeing what sort of performance is possible on this forum for mid-sized models at good quantizations made me start out with two Asus GB-10.

I have three Founder Edition DGX Sparks. Look great lined up on my bookshelf!

2 x asus GX10, thinking to add two more but the 2 → 4 cost is quite steep!

Got 2 emails within hours a month back, from Z.ai & Github Copilot about changes to T&C & usage limits. At the same time, my Claude sub started hitting limits in half the time it used to. Just so much is changing, with AI Inference providers, that it just made sense to start with this platform.

Current - 2 Asus GX-10 (with ConnectX-7).

Current - 2 DGX Sparks (with ConnectX-7)

We’ve grown to six 8x clusters along with a couple more GB10s for testing software installation on. We’re using CRS804 switches. They’ve been great for running huge models on for agentic use cases. I think I can safely say that if anyone has a use case for running the big open-source models 24/7, they can’t go wrong with one or more 8x clusters. It’s made local AI actually make sense for the first time compared with the large clouds and there’s simply not a viable alternative for that other than the Sparks.

@ash.x.kingsley You have to tell us more

@sjug, sure just ask any questions and I’ll try my best to answer within any limits of what I can discuss about it. We mostly use them for agentic programming, but also for various odds and ends like documentation, report generation, or automated build and test pipelines. We’re very grateful to the amazing open source community that’s building up around the Sparks and that’s a significant reason we felt comfortable getting into this. It’s gone so well though, there’s no looking back. We’re really hopeful that ever larger models will continue to be released open source and see the utility of the Sparks only increasing going forward.

4, started this last fall with an Asus GX10, then got a HP ZGX on a deal. And just added 2x Acer GN100s.

@aolsucks42, what’s been your go-to model on your 4x cluster?

How do you get useful performance out of them? Or are the workloads less interactive where the response times are less important?

Why the spark over RTX PRO 6000s or even H200s?

Do you run large models like GLM or Kimi? Is it all 8x running a single model?

What are you doing for 200G switching for such large clusters?

@sjug, well in our view anything ~10+ tokens per second is fine even for interactive use cases, but yeah, almost all of our use is having agents running in the background 24/7. Live interaction isn’t happening all the time and when it does the agent has already responded so there’s no slow down there. They’re running 24/7 so it’s even less of a concern after hours or on weekends or holidays. Yes, sometimes an agent gets stuck and the human can become the blocker because they’re sleeping or whatever, but that’s fine. I honestly don’t personally understand the demand of having the absolute highest TPS possible and think it’s a good tradeoff for running the biggest models, especially since the cluster increases concurrency so there can be multiple agents running on the cluster simultaneously, which increases total TPS when that’s factored in.

I think the DGX Spark is at a very specific place in the landscape of options. Beyond a hobbyist experimenting with one or two, it’s intended to be clustered and run the biggest models. There isn’t an alternative for it. I’d say the closest alternative would be a 4x cluster of Mac Studios with either 256GB or 512GB of RAM each, but those aren’t even being sold now and have a number of tradeoffs compared to using NVIDIA hardware.

Discrete GPU-based solutions are much more expensive, use multiple times more electricity, and consequently produce much more heat. We’re in a relatively modest office setting, that already has a lot of other equipment in it, so we can’t really have a room like an AI data center here and anyone running them at home would be even more in that situation, often to the point where a Spark cluster would be their only choice period without Mac Studios being available at least.

Other than that, with the Sparks we were able to grow it very gradually, testing along the way so we didn’t end up making incorrect assumptions about a new large expense. We were also able to buy them without needing to go through often annoying specialized sales channels. Being able to buy hardware at retail is much more convenient and it’s not like there are discounts to be had anyway due to the AI demand.

We’ve been evaluating a lot of different models during this journey. Lately we’ve been using DeepSeek-V4-Pro, Kimi-K2.6, Kimi-K2.7-Code, MiniMax-M3, and GLM-5.2-FP8. Usually each 8x cluster is running a different model. Maybe we’ll converge on one dominate model at some point, but we haven’t accumulated enough experience to be able to declare a hands-down winner yet. Perhaps that could be GLM-5.2-FP8, but we haven’t had much time with it yet.

As for the switches, it’s simple, each 8x cluster has its own MikroTik CRS804 switch with 400G to 2x200G breakout DAC cables.

Overall, I don’t see any way we could have the total 6TB of memory capacity for LLMs like this any other way. Sure, it could be done using GPUs or Macs too, but that’s not easy with the supply issues. I think the Sparks will be unavailable in due time too. They’re already heading to $6,000+. My only major complaint would be that we didn’t see this coming and get started late last year, but at least we got in before most of the price increases.

I have 2 it’s more than enough for what I want to do tbh I think 2, or 4 is the sweet spot. Love these little things.

Starting 4, most likely ending with 4, but who know. Cheapest gx10 1TB versions

Just one GX10. Two is becoming a possibility. But I wouldn’t go beyond that. I think.

I have two Asus GX-10 with CX7 interconnect

Same as robert287

I also have 8x cluster. Can you share the glm 5.2 yaml and dockerfile/mods if needed custom build, please? Also, what speed are you getting on the models you mentioned? I am looking for the best coding agent for 8x cluster that can achive above 20t/s at 100k token. I have Minimax M3 meeting this criteria and I hope I can get Mimo Pro dflash version running, but no luck si far..

I have two HP ZGX Nano G1n’s (second one received yesterday). Trying to setup DeepSeek V4 Flash today: wish me luck!


Tower of babel:
1 HP ZGX 4TB
1 GX10 4TB
1 AMD 395+ AI 128gb => Oculink to 4060Ti, USB4 to RTX 3050
1 Mac Studio 192gb M2 ultra
(not pictured: 1 gaming PC turned AI PC → 64GB DDR5, RTX4000 Pro Blackwell, oculink → 5070Ti)
Every computer has its use and model loaded:

Spark cluster: Deepseek V4 Flash (Hermes main model)
Mac Studio: Deepseek V4 FLash via Dwarfstar (currently trying out GitHub - Rose22/openlumara: AI agent framework, written from scratch (not based on openclaw), focused on stripping it down to the bare necessities, optimizing token count, reducing security risks. modular so you can enable only exactly what you need. · GitHub)
Strix: Qwen3.6-35B high concurrency cluster (3 instances with 4 concurrencies, aggregate 300toks/s) for Feynman
PC: Qwen-27b, Gemma4-12b for Claude-Code
– Strix also does: Embedding, reranking, voice (NPU driven), hindsight memory recall, and is the hub for all agents

If I had another 8500, I would get another 2 sparks from bestbuy.