Data inferencing to INT8U quantized model

hananana · September 24, 2021, 12:43am

Description

I’ve loaded my onnx model using nvonnxparser class and successfully ran it.
And I’ve changed the model to FP16 using “platformHasFastFp16” and succeed.

Now, I’am trying to run my model in INT8U mode.
(maybe, platformHasFastInt8 can be helpful for me.)

But the ting is that, I want to take over the pre-processed input image in INT8U type, not a FP32 one.

In the FP32/16 mode, the pre-processed input image was FP32 one.
I’ve searched some codes in google, but whole the codes are taken over pre-processed in FP32 input to INT8U quantized model.

Is there any way to run the TensorRT using only INT8U types?
Can you give me some references?

Thanks.

Environment

TensorRT Version: tensorrt docker image release 20.10 (maybe tensorrt 7.4)
GPU Type: GTX1650
Nvidia Driver Version: 460.xx
CUDA Version: 11.1
CUDNN Version: maybe 8.4
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): python3.7
TensorFlow Version (if applicable): none
PyTorch Version (if applicable): none
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

Exact steps/commands to build your repro
Exact steps/commands to run your repro
Full traceback of errors encountered

NVES · September 24, 2021, 5:21am

Hi, Please refer to the below links to perform inference in INT8
https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleINT8/README.md

Thanks!

Topic		Replies	Views
TensorRT INT8 calibration in C++ api TensorRT tensorrt	2	1798	February 14, 2022
Why does Int8 quantization occupy more GPU graphics memory than float16, TensorRT quantization TensorRT	1	510	June 6, 2023
Question about the tensorrt precision transformation TensorRT	4	470	July 12, 2021
Is there any layer that fp16 supports but int8 does not？ TensorRT	5	485	December 1, 2021
Can TensorRT 7.1.3 convert an INT8 pytorch QAT model to engine? TensorRT	3	727	April 21, 2022
TRT Engin in INT8 is much slower than FP16 TensorRT	4	1917	November 11, 2021
TensorRT INT8 inference accuracy TensorRT	2	498	May 9, 2022
How to use DLA + INT8 + I/O reformatting? TensorRT tensorrt , dla	1	613	June 13, 2023
tensorRT FP8 support TensorRT tensorrt	2	2568	June 21, 2023
Data process about TensorRT INT8 and FP16 inference Engine Jetson TX2	4	1996	October 18, 2021

Data inferencing to INT8U quantized model

Description

Environment

Relevant Files

Steps To Reproduce

Related topics