Something is not clear for me regarding the tensorrt engine input/output. The input of my network (coming from pytorch+onnxexport) is an image in the [0,255] range. This input is then divided by 255 to move in the [0.0,1.0] range.
In the ONNX model the input is defined as float32 input and then when I convert into tensorrt engine, the input is still defined as float32. I am able to run the network in fp32 precision or in int8 precision (tensorrt with python api + pycuda) doing the calibration but something is still weird for me.
Even if i run in int8 precision, the input have to be defined as float32, so the amount of data to copy from the host to the device is 4x important than what I could expect if the tensorrt input would be defined as int8.
My question is very simple. Is it possible to avoid this? If yes, what is the best way to avoid this?
I tried some tricks like the following forcing the input to be input during the tensorrt engine building but doing this the output is wrong:
for i in range(0, self.network.num_inputs):
self.network.get_input(i).allowed_formats = 1 << int(trt.TensorFormat.LINEAR)
self.network.get_input(i).dtype = trt.DataType.INT8
if you can guide me to the best way it would be great.
TensorRT Version: 7.1.3
GPU Type: GTX 1080ti
Nvidia Driver Version: 460.73.01
CUDA Version: 11.2
CUDNN Version: 8.0.2
Operating System + Version: ubuntu 18.04
Python Version (if applicable): 3.6.9
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.9
Baremetal or Container (if container which image + tag):
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
- Exact steps/commands to build your repro
- Exact steps/commands to run your repro
- Full traceback of errors encountered