I have ordered a second unit. Don't know why my friends say I'm stupid

This is not a technical post… just a message to the “internet” about this device.
I just ordered a second unit after 4 weeks of real usage with the first one. My experience was the opposite of what everyone says on YT, REDDIT and so on…

GB10 is a really good device, maybe the first months after the launch the early adopters were unsatisfied about performance and it generated a bad impression that is echoing still now (not sure if it makes sense in english…). People are not “programmed” to change its own ideas easily, people hate to change ideas.

GB10 can run MOE and DENSE big model with low energy drain.
It’s based on Linux, it’s well designed and looks professional.
It is ready to be used in production and the community is real, passionate and helpful.

I own also an RTX 6000 PRO and when need to speedup processing I use it with a similar configuration (vLLM and same quantized model). The same workflow is 4/5x faster but it drains 10x energy power every hour.

GB10 is good and in my use cases it is unbeatable. I ordered a second unit and planning to buy more, my target is 4 units.

Long live to the king!

For personal futureproofing your are in a good place, i got 4 of them.
In the world of privacy first, this is bang for the buck right?

Go hard or go home :)

I don’t need a 2nd one… i don’t need a 2nd one… I won’t buy a 2nd one…

dang it.. I want a 2nd one so bad!!! :)

You really get the best value out of Spark if you have 2 or 4.
Even if you use the same models, you can get a nice performance boost and much larger KV cache.

Picking up my 4th later today and my Mikro-Tik CRS804-DDQ is arriving Monday they definitely lend themselves to adding more :D

I have got with my third GB10(but I have to say, I bought it before price increase). 2 as a cluster for larger 4bit models e.g. Minimax, Qwen3.5, etc. 1 for a pretty fast agentic llm, comfyui, whisperX, local dify and a lot of other stuff.
Again. I love it, and i love the enthusiastic community here.

How much of a performance boost would you actually get, going from 1 to 2 units?

And could you expand a bit what you mean by “best value” in this case?

Up to ~1.8x on dense models, you can play with some benchmarks on https://spark-arena.com/

Best value in that you utilize ConnectX 7 networking that otherwise sits there unused, and you can run decently sized models (MiniMax M2.7, even Qwen3-397B) with acceptable performance.

I’m sorry @azampatti , but I’m not sure you’ll find someone here that regrets buying a second one…

Sure you don’t need a second one. Okay, you don’t need a second one. You don’t. Or maybe… Good luck :D

The ConnectX 7 module by itself costs 1.5-1.7k. If you’re not using it, you’re wasting 37% of the hardware value. With 2, you’re getting 1.7-1.8x performance for dense models and you can run some of the most interesting MoE models and it opens up new agent orchestration/multi-model complex systems you can PoC on it. So, in the end you have a setup that increases the number of use cases and utilizes the whole hardware capability. So, IMO 2 nodes is the minimal to start unlocking the full Spark potential, unless you really can’t afford it or your use case is really restricted.

I have ordered a second unit …the only question now is is this going to be the last one? These mighty Sparks are irresistible!

Spark has something definitely addictive about it. I started with 1, I’m up to 4 hoping to make the current ones profitable for my financial health. I’ll probably end up with 8, which is what my current router, the D804, a simple home router, can handle.

My wallet is the only thing that keep me from buying the second unit.

200

I was so concerned getting my current spark after all the negative I read but I love the thing. Yes it’s slow at some stuff, other things perfectly fine. It’s been great for testing/proving flows and ideas and if I need more speed at that point I rent what I need but I already was able to prove what I needed instead of wasting hours messing around.

Now I find myself using the spark for stuff constantly and a second is high on my list. just keep going hmmm if only a little faster or I wish I could try that model. or when I have current maxed out doing video or something in wan wishing I had another to use for my other experiments. lol

Love this little box. it’s opened up what I could do so much more than my pc with a 4080 in it.

I just hope I can stop at 2 lol

I went into the deep end right away with two Sparks. I may increase that to four over time as long as I can do it without adding a large noisy switch to the mix.

may I ask what is the software that display the usage of gemma (on your screenshot)? It looks pretty cool!

Seems to be a grafana dashboard with values from the vllm exported metrics

Example vllm/examples/observability/prometheus_grafana/README.md at main · vllm-project/vllm · GitHub

Documentation

BTW, I also fell in the dual node setup, still trying to resist acquiring a third…

Same feelings — I wrote an article about this two months ago. You can read it here: #localai #dgx #agenticai | Dalibor Kubis .

Today is Sunday—let’s pause and take a look around.

The Spark has two standout features in the NVIDIA ecosystem:
First, it’s not a full Blackwell BS120 architecture; it’s Grace Blackwell BS121. And having a higher number doesn’t mean backward compatibility—in fact, it requires significant developer adaptation. In some cases, you even have to fall back to Ampere architectures (like BS80), which also lack TenG5 and hardware concurrency. As you all know, this leads to issues with NVFP4 and critical shortcomings in Continuous Batching.
Second, the memory isn’t LPDDR7X or LPDDR6X—it’s LPDDR5X, which is two generations old.

If you have an RTX 6000, great! You can use it for inference tasks that take longer on the Spark.

As of May 2026, a question arises: Can smaller models like Qwen3.6 27B, and upcoming LLMs, run faster on an RTX 5090? The reason is that models that fit entirely in VRAM and use compressed contexts with TurboQuant could perform much better on the RTX 5090’s 512-bit bus (1.75 TB/s) than on an RTX 6000 Blackwell Pro (1.25 TB/s). Note: Spark GX10 at 0,270TB/s