nvFP4 training - Playbook request

Could you please create a playbook for stable nvFP4 training using the transformer-engine. A basic template would be very useful. thanks

2 Likes

They are very likely not going to bother.

why not?

Sorry, I’m a bit frustrated with slow pace of support for Spark.

I have passed this request along to the playbook team

@vgoklani Can you tell me which model you want to train on and why you want to use the GB10 for training instead of finetuning?

sure, we want to pre-train Andrej Karpathy’s nanoChat model using nvFP4

The repo is here:

This is the best use-case for the DGX Spark as the models are small (~500M parameters) and this give us a playground to test different model types etc before deploying to a cloud instance for a larger training run.

FYI, there are several threads in the github discussion forum for nanoChat, where users are training with a DGX Spark… This playbook would be incredibly helpful for a lot of people.

Thanks!

Thanks for the info. A playbook for nanochat is already in the works and will be published soon. We will also post an update on the forum so stay tuned.

Will it use nvFP4? That’s the whole point of this request!!!

It will not use nvFP4, it does not look like the nanochat repo offers nvfp4

The point of this exercise is to train something in nvFP4, and we are proposing nanoChat since it’s a small model and a good base-case. And there is clearly demand (just look at the super-long threads in the nanoChat discussion forum).

We don’t care about nanoChat, or the nanoChat playbook, the goal is to pre-train in nvFP4 and that is the request. It sounds like we are not communicating properly.

Sorry for the confusion. There is no plan to build nvfp4 training playbook yet. I have passed the request along however, and the team will evaluate and prioritize in the future