How to pass uint8 input to a tensorrt engine?

thomas.boulay · September 14, 2021, 4:43pm

Description

Hello,

Something is not clear for me regarding the tensorrt engine input/output. The input of my network (coming from pytorch+onnxexport) is an image in the [0,255] range. This input is then divided by 255 to move in the [0.0,1.0] range.

In the ONNX model the input is defined as float32 input and then when I convert into tensorrt engine, the input is still defined as float32. I am able to run the network in fp32 precision or in int8 precision (tensorrt with python api + pycuda) doing the calibration but something is still weird for me.

Even if i run in int8 precision, the input have to be defined as float32, so the amount of data to copy from the host to the device is 4x important than what I could expect if the tensorrt input would be defined as int8.

My question is very simple. Is it possible to avoid this? If yes, what is the best way to avoid this?

I tried some tricks like the following forcing the input to be input during the tensorrt engine building but doing this the output is wrong:

for i in range(0, self.network.num_inputs):
self.network.get_input(i).allowed_formats = 1 << int(trt.TensorFormat.LINEAR)
self.network.get_input(i).dtype = trt.DataType.INT8

if you can guide me to the best way it would be great.

Best regards,

Environment

TensorRT Version: 7.1.3
GPU Type: GTX 1080ti
Nvidia Driver Version: 460.73.01
CUDA Version: 11.2
CUDNN Version: 8.0.2
Operating System + Version: ubuntu 18.04
Python Version (if applicable): 3.6.9
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.9
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

Exact steps/commands to build your repro
Exact steps/commands to run your repro
Full traceback of errors encountered

ninolendt · September 15, 2021, 9:32pm

UINT8 format is actually not supported by TensorRT.

spolisetty · September 16, 2021, 5:10am

Hi @thomas.boulay,

UINT8 data type is currently not supported by TensorRT. Following method which you’re trying is actually correct one. Not sure why output is wrong.

for i in range(0, self.network.num_inputs):
    self.network.get_input(i).allowed_formats = 1 << int(trt.TensorFormat.LINEAR)
    self.network.get_input(i).dtype = trt.DataType.INT8

Could you try setting the self.network.get_input(i).dynamic_range = (-127 * scale, 127 * scale) where scale is the actual scale the used to convert the data from FP32 to INT8.

Also we have to be careful with calibration when using IO formats with INT8, Please refer

Thank you.

thomas.boulay · September 16, 2021, 2:50pm

Hi @spolisetty ,

I didn’t get how to fix the scale parameter you mentionned to be honest.

Actually, my network is trained using [0,255] images which are simply divided by 255 to get [0, 1.0] images.
My idea was to input the network with uint8 images to reduce the transfer from host do device and do this /255 on the board but as you said, it’s not possible because uint8 format is not supported.

Now, you are saying that INT8 is supported. So first, I guess for me it involves to retrain my network with [-128, 127] images which are then divided by 128 to get [-1.0, 1.0] images. Am I correct?

Secondly, what I didn’t get is how tensorrt handle this /128 on the input when we are in int8 precision. Basically, when we infer in int8 precision, this /128 should be just managed by adjusting the quantization factor. I mean if the input is just [-128,127] images, all the 7 bits are reserved for the integer part, so quantization factor is equal to 1. On the other hand, if we want to /128 to have [-1.0, 1.0] images, all the 7bits are reserved for the fractional part, so the quantization factor should be equal to 2^7, if my understanding of quantization step is correct.

Is Tensorrt able to check in the onnx model that my input layer is followed by a divide layer (/128) when the engine is creating and the calibration is done?
And to come back to the scale factor you mentionned, how can I link this factor to what I explain or is it just something completly different? Actually I didn’t get “scale is the actual scale the used to convert the data from FP32 to INT8” because I don’t convert the data from FP32 to INT8, I just input directly INT8 images [-127, 128] values to the network.

I hope my question are clear enough and you could help me.

Br,

spolisetty · September 20, 2021, 6:23pm

Hi,

We recommend you to try following.

Train the network with signed INT8 input
Specify [-128, 127] as the dynamic range to TensorRT. Because TRT uses symmetric quantization it will interpret this as [-128, 128] but since it’s only used for calculating rescaling for the first op in the network, it shouldn’t matter. Might want to experiment and see if [-127, 127] works better, but we’d expect only a marginal difference. Also with training the network with a symmetric dynamic range for the inputs, but appreciate that’s quite annoying if you have an integer range [0,255] to start with.
When calibrating the network, provide an FP tensor with the same values as the INT8 tensor (i.e. in the range [-128, 127]

Thank you.

thomas.boulay · September 21, 2021, 8:27am

Hi @spolisetty,

Ok thanks for the confirmation. I followed it with my network trained with uint8 input and it’s now working (by working I mean the output of the fp32 infer and int8 infer are quite the same). I will definitevely confirm that by retraining a network with int8 input.
My error was to define an uint8 input buffer instead of int8, so even with the dynamic range definition, the input numbers was wrong and so the input was wrong.

Thanks for your support

Topic		Replies	Views
Unable to load parse onnx network with int8 operations TensorRT tensorrt , onnx	7	2047	April 14, 2021
How can we know we have convert the onnx to int8trt rather than Float32? TensorRT tensorrt	23	1881	June 14, 2021
Convert int8-onnx model to trt engine? TensorRT onnx	6	1086	April 29, 2023
Can convert to INT32 but not with FP16 TensorRT	3	1041	November 29, 2022
ONNX to TRT Engine conversion Error TensorRT tensorrt	8	3713	May 25, 2022
How to convert UNet model by tensorrt INT8 TensorRT cudnn	2	42	January 1, 2025
Tenssorrt INT8 precision engine build failed for the models having custom layer (BatchedNMSDynamic_TRT) TensorRT	11	1921	June 29, 2021
tensorrt inference error while load onnx model TensorRT	8	3356	October 12, 2021
ONNX Model Int64 Weights TensorRT	12	13384	February 17, 2024
Calibration and int8 inference on Onnx model TensorRT tensorrt	17	2463	March 20, 2023

How to pass uint8 input to a tensorrt engine?

Description

Environment

Relevant Files

Steps To Reproduce

Related topics