DGX Spark by far the best inference (at the edge) option?

DGX Spark by far the best inference (at the edge) option?

There is a lot of disappointment on the DGX Spark when it comes to inference.

But I think, DGX Spark is by far the best inference (at the edge) option for large models (>>100B) at the moment, e.g. for SMEs.
Where am I going wrong with that statement?

In scenarios for on prem inference for SMEs with limited budgets, there are several options like

  • RTX Pro 6000 Blackwell with 96GB
  • RTX 5090 32GB
  • Mac Ultras
  • DGX Spark
  • Strix Halo

Clustered DGX Spark (being around 3100 USD / Asus version, and a cluster of two sparks 6200 USD) runs Minimax M2.1 decently, other LLMs of similar size as well.

This would require two RTX Pro 6000. Including the computer it is almost 3 times the price. Yes, faster. But I could get nearly 6 sparks, i.e. 3 cluster that, at least with vLLM, scale well enough at decent speed.

Mac Studios with Ultra chips are great as well, but my scenario requires at least 256 GB RAM and 80 core GPU at around 7000 USD. Much worse prompt processing speed and scaling options. But good tg speed.

Strix Halo is way too slow and no scaling.

RTX 5090 is out of game, would need 5 or 6, which is crazy energy consumption and requires huge chassis.

Where am I going wrong ?

Thank you for your review. We are glad you are liking the Spark. Please let us know if you have any more feedback.

This seems spot on. It’s one of the reasons I am excited with the DGX Spark platform.