Yeah, that was the main selling point for me as well. For training is even better. I was running pre-training for nanochat and it increased tok/sec processing to 33k from 14k. On an 8XH100 node at $24/hr, the pre-training gives a total run time of about 4 hours, so cost is -$100.
On stacked sparks, it takes 5 days and it costs $2.1 * 2 * 5 (depreciation) + $8 * 2 (energy) = $37 for pre-training an LLM (which is not even the focus of the product)
Pretty neat