Will FP8 releated schema in TransformerEngine upstream to PyTorch?

jybsnow11 · January 15, 2024, 9:39am

Hi there,

I’m very interested in FP8 schema in TransformerEngine, like DelayedScaling and fp8_autocast. It’s a brand-new data type support, which uses Delayed Scaling to do calibration, unlike int8 quantization.

Since it can run with PyTorch, I’m wondering will this FP8 schema upstream to PyTorch community in the near feature? For all I know, now PyTorch only supports FP8 data types without scaling. And FP8 in TransformerEngine can fill this gap in PyTorch.

Thanks!

Robert_Crovella · January 18, 2024, 5:37pm

you may get better help by asking on discuss.pytorch.org. There are NVIDIA pytorch experts there.

Topic		Replies	Views
tensorRT FP8 support TensorRT tensorrt	2	2796	June 21, 2023
Unable to quantization FP8 in TensorRT TensorRT tensorrt	1	549	June 20, 2023
4090 doesn't have fp8 compute? CUDA Programming and Performance	20	15493	August 6, 2024
[Hugging Face transformer models + pytorch_quantization] PTQ quantization int8 is slower than fp16 TensorRT tensorrt , python , onnx , natural-language-processing-nlp	4	3042	January 6, 2022
Can hopper support recent published 1D scaling of FP8 in cuBlasLt GPU-Accelerated Libraries cublas	1	36	February 26, 2025
Is there any layer that fp16 supports but int8 does not？ TensorRT	5	494	December 1, 2021
About pytorch QAT and torch to tensorrt DRIVE AGX Xavier General driveos-dl	2	773	November 1, 2021
Per-Tensor and Per-Block Scaling Strategies for Effective FP8 Training Technical Blog	1	19	July 1, 2025
TensorRT5/6 FC Layer not support Int8 quantization. TensorRT	4	962	October 20, 2019
How to apply int8 quantization to Transformer on Xavier Jetson Xavier NX tensorrt	2	495	August 12, 2022

Will FP8 releated schema in TransformerEngine upstream to PyTorch?

Related topics