Why 273 GB/s? Less Is More, Until It Isn’t

Its about loss of information. FACT: you loose something that the llm was glad to have been learned from training: nuances. there is another thread concerning FP4, especially NVFP4. NVIDIA has a new star on the firmament: PPXZPXUT? ( after QAD, QAT and other quant aware techniques. And they ended up in retraining.

Others train llms in INT4 initially (moonshot). So 4 bit is the way, for some applications good enough, but it shows degration if is derived from larger words (16 bit)! And to give you an example of the costs: If you do a research of solution competence vs amount of weights a model should have, you will notice that weights doubled means only competence +2%. So… if you loose 10% or more by quant to 4 bit, it hurts, mentally of course.

I think this is piloting, this is a huge portion of hope and this is application driven. NVIDIA should optimize GLM 4.7 to keep BF16 accuracy in NVFP4. Mission (proof the) impossible! I’d buy another two sparkies if it works.

see also: