Unable to quantization FP8 in TensorRT

Joe.123 · June 19, 2023, 1:01pm

Description

Unable to run inference using TensorRT FP8 quantization

Environment

TensorRT Version: 8.6.1
GPU Type: RTX 4070 Ti
Nvidia Driver Version: 530
CUDA Version: 12.1
CUDNN Version: 8.9.2.26
Operating System + Version: Ubuntu 22.04 LTS
Python Version (if applicable): 3.10
TensorFlow Version (if applicable): —
PyTorch Version (if applicable): —
Baremetal or Container (if container which image + tag): baremetal

My Problem

NVIDIA claimed that the new 4th gen TensorCores support FP8 quantization.
So I have an RTX 4070 Ti (ada lovelace archi.) which has the 4th Gen of TensorCores and suport FP8 format. So I have installed CUDA 12.1 and TensorRT 8.6.1 which include kFP8 data type for FP8 format.

However, I cannot successfully run inference using FP8. INT8 on the other hand works fine but FP8 is not working.

Here I have found in limitations section that FP8 is not supported in TensorRT yet.

So basically FP8 is implemented in TensorRT 8.6.1 but it is not supported yet?
Could you please explain to me what that means? Because for me it is not supported means it is not implemented as well… !!

My Questions

How I can quantize a model in FP8 and run inference in FP8 using TensorRT.
Which tool I should use instead of TensorRT to quantize, calibrate and run inference in FP8 format.
When will TensorRT support FP8 quantization format.
Do you have any Early Access software to quantize and use FP8 format.

Thank you in advanced.

Best regards

spolisetty · June 20, 2023, 9:14am

Hi,

1 TensorRT 8.6 adds nvinfer1::DataType::kFP8 to the public API in preparation for the introduction of FP8 support in future TensorRT releases. However, FP8 (8-bit floating point) is not supported by TensorRT currently, and attempting to use FP8 will result in an error or undefined behavior.

Please refer to the Developer Guide :: NVIDIA Deep Learning TensorRT Documentation for the same.

Thank you.

Topic		Replies	Views
tensorRT FP8 support TensorRT tensorrt	2	2298	June 21, 2023
Turing Tensor core int4 operation TensorRT	3	2776	December 11, 2018
Is there any layer that fp16 supports but int8 does not？ TensorRT	5	478	December 1, 2021
Why does Int8 quantization occupy more GPU graphics memory than float16, TensorRT quantization TensorRT	1	483	June 6, 2023
How do i use tensorrt 8.0.1.6 for python3.6? TensorRT	3	982	November 8, 2021
Question about the tensorrt precision transformation TensorRT	4	469	July 12, 2021
Excuse me, does the 3060Ti graphics card support TensorRT int8 quantization? TensorRT	1	1011	June 23, 2022
Int8 get the same result, but in FP16 the result is correct TensorRT	1	393	December 1, 2021
Int8 quantization TensorRT	1	485	December 16, 2021
INT8 quantization with Torch-TensorRT fails TensorRT tensorrt , pytorch	3	871	June 29, 2022

Unable to quantization FP8 in TensorRT

Description

Environment

My Problem

My Questions

Related topics