Saw that. I strongly believe NVIDIA could build a great amount of goodwill with this community if they leveled @eugr up with two more sparks and the necessary switchgear and cables.
The support he has provided has dramatically increased the value of their offering. @johnny_nv any pull here?
Totally support this. Where do I sign?
Without the support of this community, thanks a @eugr ., it wouldn’t be possible to test the latest models; everything would be much slower. I think this is a puzzle machine.
Yeah saw that also :D eugr became famous!
There are definitely at least a community members that deserve some love like @raphael.amorim also. Nvidia should hire them…
He was already famous in my eyes!
Sometimes if you work for a company your hands and tongue are tied… so to speak. Non-biased info is always good. Even though all info usually has some form of bias. They should send him 8 of these, enable infiniband on the connectx7 interface, and provide a quality switch. Maybe about a ~ $64k setup.
Thanks guys!
At this point, it’s not just “my” project anymore, there is a whole community around it - some contribute to the repository directly (e.g. @raphael.amorim) , some indirectly (like @christopher_owen), and others by submitting new issues, flagging new models/workarounds in the forums, etc…
@eugr is the celeb on this forum. He’s literally flying sky-high on a helicopter. I second Nvidia sending bulk dgx’s to him, but, rabbit hole investigations show Nvidia has artificially bottlenecked the DGX Spark to separate the $4000 tier from the $10000 tier RTX 6000 users. I am reserved as to whether or not they would do so, although, I have seen them send DGX Sparks to podcasters who just learned what the echo command does,
I saw the video! I also support some sparks to eugr :)
Alex’s video sent me over the edge. I went and purchased a second GX10 last night. And now thanks to Eugr’s repo, I have MM M2.5 running. Amazing.
I tried, and decided to stick to the airplanes :) My instructor had a good laugh though, lol.
Funny thing, but the reason I purchased the second Spark originally was because no one would test inference on a cluster :) That was back in November. Turned out that it not only works fairly well, but makes Spark much more useful.
I remember those times. What a long thread. LOL
When I got my second Spark most of the fun times were gone.
2nd GB10 box arrived this morning. Would probably not have done it so soon if @eugr didn’t give us that spark-vllm-docker repo to test strategies with :)
The rabbit is waiting for the right quantization 😂
Really nice work — I genuinely enjoy your channel and it’s always a pleasure to watch your updates. That said, I’m not fully sure why you chose this specific quantization and stack.
I’m also trying to understand the practical point of the benchmark as it stands, because I’m getting ~81 TPS on llama.cpp on my side, which is clearly higher than what your test shows even on 4 Sparks. So it feels like we need to reframe the experiment: either rerun it on EXO, or try alternative tools + different quants to see what the real takeaway is and whether the Spark setup actually delivers a meaningful advantage in real-world inference.
i have (17GB, MXFP4, 1M) [256K, 🟢81.3 tok/s ⭐9.7(#2) 🏆] - Qwen3-VL 30B - on 1 spark.
I think he used full precision weights but you are mentioning mxfp4. This could definitely lead to the TPS difference.
He used different models in the test, not sure which one you are talking about. I don’t think he used Qwen3-VL-30B, he used Qwen3-VL-32B (in BF16), which is a dense model and has 32B active parameters (vs. 3B active parameters for Qwen3-VL-30B). The 32B model is very slow even on dual sparks (unless you go to 4 bit quants, then it performs at around 21 t/s on two sparks (12 t/s on one).

