tensorRT FP8 support

Harry-S · June 15, 2023, 1:10pm

Description

Unable to run inference using TensorRT FP8 quantization

Environment

TensorRT Version: 8.6.1
GPU Type: RTX 4070 Ti
Nvidia Driver Version: 530
CUDA Version: 12.1
CUDNN Version: 8.9.2.26
Operating System + Version: Ubuntu 22.04 LTS
Python Version (if applicable): 3.10
TensorFlow Version (if applicable): —
PyTorch Version (if applicable): —
Baremetal or Container (if container which image + tag): baremetal

My Problem

NVIDIA claimed that the new 4th gen TensorCores support FP8 quantization.
So I have an RTX 4070 Ti (ada lovelace archi.) which has the 4th Gen of TensorCores and suport FP8 format. So I have installed CUDA 12.1 and TensorRT 8.6.1 which include kFP8 data type for FP8 format.

However, I cannot successfully run inference using FP8. INT8 on the other hand works fine but FP8 is not working.

Here I have found in limitations section that FP8 is not supported in TensorRT yet.

So basically FP8 is implemented in TensorRT 8.6.1 but it is not supported yet?
Could you please explain to me what that means? Because for me it is not supported means it is not implemented as well… !!

My Questions

How I can quantize a model in FP8 and run inference in FP8 using TensorRT.
Which tool I should use instead of TensorRT to quantize, calibrate and run inference in FP8 format.
When will TensorRT support FP8 quantization format.
Do you have any Early Access software to quantize and use FP8 format.

Thank you in advanced.

Best regards

AakankshaS · June 19, 2023, 4:07am

Hi, Please refer to the below links to perform inference in INT8

Thanks!

spolisetty · June 21, 2023, 12:32pm

Hi,

1 TensorRT 8.6 adds nvinfer1::DataType::kFP8 to the public API in preparation for the introduction of FP8 support in future TensorRT releases. However, FP8 (8-bit floating point) is not supported by TensorRT currently, and attempting to use FP8 will result in an error or undefined behavior.

Please refer to the Developer Guide :: NVIDIA Deep Learning TensorRT Documentation for the same.

Thank you.

Topic		Replies	Views
Unable to quantization FP8 in TensorRT TensorRT tensorrt	1	549	June 20, 2023
Turing Tensor core int4 operation TensorRT	3	2822	December 11, 2018
INT8 quantization with Torch-TensorRT fails TensorRT tensorrt , pytorch	3	895	June 29, 2022
Is there any layer that fp16 supports but int8 does not？ TensorRT	5	494	December 1, 2021
TensorRT 10.2 is not using FP8 convolution tactics when building a FP8 quantized conv model TensorRT tensorrt , tensorrt-model-optimizer	2	248	July 10, 2024
TensorRT quantization uses int8 or uint8 TensorRT tensorrt	1	870	June 6, 2023
Tensorrt inferencing getting failed with custom quantized int 8 TensorFlow model TensorRT tensorrt , ubuntu , python , cudnn	1	32	March 28, 2025
TensorRT5/6 FC Layer not support Int8 quantization. TensorRT	4	962	October 20, 2019
Int8 get the same result, but in FP16 the result is correct TensorRT	1	404	December 1, 2021
ConvNeXT inference with int8 quantization slower on tensorRT than fp32/fp16 TensorRT cudnn , tensorrt-model-optimizer	1	117	November 30, 2024

tensorRT FP8 support

Description

Environment

My Problem

My Questions

Related topics