Good Day, Having lived through the DGX Spark model size issue, I would like to get some clarification of the up to 1T parameter model size claim for the DGX Station .
At FP4, a 1.T model would require ~500-600GB + overhead such as KV cache. This listing does not seem to be a unified memory model. So the only memory this might fit is the 496 GB LPDDR5X RAM, but will be about as fast as a Mac Studio studio running inertance with the 396 GB/s limitation. And to get this size in GPU memory would need another 2 additional A6000’s with 96GB each to fit.
Am I missing something here in the advertising?
I do Prey that the Blackwell GPU is a standard SM120 and not a custom throttled down / modified new model that vLLM and others will need to write custom code for adoption.
Lastly, if you would like a 3rd party, independent evaluation of a unit, let me know :-)
To my mind, what we’re really buying is “GPU Memory Up to 252GB HBM3e | 7.1TB/s”. This seems like the finalized spec (I think this was from ASUS). 496GB LPDDR5X @ 396GB/s is a bit disappointing in some ways, but, on the other hand it is directly addressable as VRAM and coming in slightly under the 2 x ConnectX-8 networking 200GB/s each - a DGX Spark is 100GB/s per port - it becomes an interesting ecosystem.
I don’t feel that running a 1T param model is going to be particularly special. 7.1TB/s memory access is special but limited to 252GB. It’s going to be a lot of money for what we get - unlike the Spark which feels like a bargain. We can buy a lot of dedicated compute hours for the price of one of these - so really makes this device more interesting for long-running tasks.
I still want one though!
This is a tough one, I want one as well, but it’s a tough one.
Not sure what the reasoning was to include so much DDR ram when designing this, a 1:1 ratio to the GPU memory would have been enough. I do hope they explain the rationale of this one.
Maybe wait for Spark II to release later this year on Rubin and a solved memory roadblock.