Data inferencing to INT8U quantized model


I’ve loaded my onnx model using nvonnxparser class and successfully ran it.
And I’ve changed the model to FP16 using “platformHasFastFp16” and succeed.

Now, I’am trying to run my model in INT8U mode.
(maybe, platformHasFastInt8 can be helpful for me.)

But the ting is that, I want to take over the pre-processed input image in INT8U type, not a FP32 one.

In the FP32/16 mode, the pre-processed input image was FP32 one.
I’ve searched some codes in google, but whole the codes are taken over pre-processed in FP32 input to INT8U quantized model.

Is there any way to run the TensorRT using only INT8U types?
Can you give me some references?



TensorRT Version: tensorrt docker image release 20.10 (maybe tensorrt 7.4)
GPU Type: GTX1650
Nvidia Driver Version: 460.xx
CUDA Version: 11.1
CUDNN Version: maybe 8.4
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): python3.7
TensorFlow Version (if applicable): none
PyTorch Version (if applicable): none
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

Hi, Please refer to the below links to perform inference in INT8