Pre-training Nanochat on DGX Spark (Standalone and Clustered mode)

I’m doing some tests on the spark to run pre-training for GitHub - karpathy/nanochat: The best ChatGPT that $100 can buy. · GitHub.
There’s a good discussion and a lot of shared benchmarks and instructions here:

Lots of new things happening in that thread. Instructions to how to train it on 2 sparks: