Hello,
I’m trying to use nvfp4 on DGX Spark. From the errors I get and the issues and PRs in TransformerEngine repo (e.g., #2255 and #2279) I read that I have to build transformer-engine for the right sm_xxxa (NVTE_CUDA_ARCHS=121a or 120a?). But what I have tried has not worked yet.
If/when this feature is supported, could you please share how exactly to build TE with nvfp4 support for DGX Spark and how to properly specify NVFP4BlockScaling() flags? Also, is there any NGC to use?
Please ignore. I meant to post this in DGX Spark user forum.
Trying to delete the post gives me permission errors.
DGX Spark is sm121, not sure if TransformerEngine supports it however. Will investigate
Thanks for your response and link.
There are other resources too, for nvfp4-quantized inference, such as the following two.
However, my use case is different. I am training a custom model (a recurrent variant of Transformer). The model is bandwidth-limited, and hence its training throughout drops on DGX Spark, in spite of Spark’s 128GB memory (my base line is a Titan RTX workstation). My understanding is that support for the feature (nvfp4) has to come from transformer-engine package, but so far it does not seem to be available for DGX Spark (sm_121) yet.
We moved it to the right forum, no need to delete @vida_vakil :-)
ScottE