Hi there,
I’m very interested in FP8 schema in TransformerEngine, like DelayedScaling and fp8_autocast. It’s a brand-new data type support, which uses Delayed Scaling to do calibration, unlike int8 quantization.
Since it can run with PyTorch, I’m wondering will this FP8 schema upstream to PyTorch community in the near feature? For all I know, now PyTorch only supports FP8 data types without scaling. And FP8 in TransformerEngine can fill this gap in PyTorch.
Thanks!