The impact of network input data types on inference speed and accuracy !

Description

At present, when I am accelerating my own model with tensorrt, I have encountered a problem, that is, when I use different data types to feed the network for inference, the inference speed and inference results of the network will be different, as follows:

the code as follows:


The impact of different data types on inference time:

different predictions:
the float32 (input data type)predict results:

the uint8(input data type) predict results:

In addition, I also compared the results of uint8 and float32 input data, as follows:

I found that the values are the same except for the data type!

Currently, my two prediction results use the same tensorrt-accelerated model based on the fp16 acceleration method.

Looking forward to your help, thanks a lot!

Environment

TensorRT Version: 8.2
GPU Type: 3060ti
Nvidia Driver Version: 510
CUDA Version: 11.3
CUDNN Version: 8.2
Operating System + Version: ubunt20.04
Python Version (if applicable): 3.7.9
TensorFlow Version (if applicable): no use
PyTorch Version (if applicable): 1.10.1+cu113
Baremetal or Container (if container which image + tag):

Hi,

Request you to share the model, script, profiler, and performance output if not shared already so that we can help you better.

Alternatively, you can try running your model with trtexec command.

While measuring the model performance, make sure you consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
Please refer to the below links for more details:
https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-803/best-practices/index.html#measure-performance

https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-803/best-practices/index.html#model-accuracy

Thanks!