How to pass uint8 input to a tensorrt engine?

Description

Hello,

Something is not clear for me regarding the tensorrt engine input/output. The input of my network (coming from pytorch+onnxexport) is an image in the [0,255] range. This input is then divided by 255 to move in the [0.0,1.0] range.

In the ONNX model the input is defined as float32 input and then when I convert into tensorrt engine, the input is still defined as float32. I am able to run the network in fp32 precision or in int8 precision (tensorrt with python api + pycuda) doing the calibration but something is still weird for me.

Even if i run in int8 precision, the input have to be defined as float32, so the amount of data to copy from the host to the device is 4x important than what I could expect if the tensorrt input would be defined as int8.

My question is very simple. Is it possible to avoid this? If yes, what is the best way to avoid this?

I tried some tricks like the following forcing the input to be input during the tensorrt engine building but doing this the output is wrong:

for i in range(0, self.network.num_inputs):
self.network.get_input(i).allowed_formats = 1 << int(trt.TensorFormat.LINEAR)
self.network.get_input(i).dtype = trt.DataType.INT8

if you can guide me to the best way it would be great.

Best regards,

Environment

TensorRT Version: 7.1.3
GPU Type: GTX 1080ti
Nvidia Driver Version: 460.73.01
CUDA Version: 11.2
CUDNN Version: 8.0.2
Operating System + Version: ubuntu 18.04
Python Version (if applicable): 3.6.9
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.9
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

UINT8 format is actually not supported by TensorRT.

1 Like

Hi @thomas.boulay,

UINT8 data type is currently not supported by TensorRT. Following method which you’re trying is actually correct one. Not sure why output is wrong.

for i in range(0, self.network.num_inputs):
    self.network.get_input(i).allowed_formats = 1 << int(trt.TensorFormat.LINEAR)
    self.network.get_input(i).dtype = trt.DataType.INT8

Could you try setting the self.network.get_input(i).dynamic_range = (-127 * scale, 127 * scale) where scale is the actual scale the used to convert the data from FP32 to INT8.

Also we have to be careful with calibration when using IO formats with INT8, Please refer
https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#reformat-free-calibration

Thank you.

Hi @spolisetty ,

I didn’t get how to fix the scale parameter you mentionned to be honest.

Actually, my network is trained using [0,255] images which are simply divided by 255 to get [0, 1.0] images.
My idea was to input the network with uint8 images to reduce the transfer from host do device and do this /255 on the board but as you said, it’s not possible because uint8 format is not supported.

Now, you are saying that INT8 is supported. So first, I guess for me it involves to retrain my network with [-128, 127] images which are then divided by 128 to get [-1.0, 1.0] images. Am I correct?

Secondly, what I didn’t get is how tensorrt handle this /128 on the input when we are in int8 precision. Basically, when we infer in int8 precision, this /128 should be just managed by adjusting the quantization factor. I mean if the input is just [-128,127] images, all the 7 bits are reserved for the integer part, so quantization factor is equal to 1. On the other hand, if we want to /128 to have [-1.0, 1.0] images, all the 7bits are reserved for the fractional part, so the quantization factor should be equal to 2^7, if my understanding of quantization step is correct.

Is Tensorrt able to check in the onnx model that my input layer is followed by a divide layer (/128) when the engine is creating and the calibration is done?
And to come back to the scale factor you mentionned, how can I link this factor to what I explain or is it just something completly different? Actually I didn’t get “scale is the actual scale the used to convert the data from FP32 to INT8” because I don’t convert the data from FP32 to INT8, I just input directly INT8 images [-127, 128] values to the network.

I hope my question are clear enough and you could help me.

Br,