Performance drop when performing inference with a .trt engine on a python script


TensorRT Version: 7.2.3
Nvidia Driver Version: 460.32
CUDA Version: 11.2
CUDNN Version: 8.1
Operating System + Version: Ubuntu 18.04
Python Version: 3.6


Hi, I am trying to perform inference with a trt engine on an instance of aws ec2.

For this I am using a python script by which I load the images and then do the inference with the trt engine.

The model is an efficientnet b1 trained with TAO toolkit 3.0 and when doing the inference with the .tlt model the results are good, while doing it with the .trt using the python script the results for the same set of images are much worse.

Do you know what could be wrong? It occurs to me that it could be the preprocessing, since I am reading the images without doing it. However, when training the model in TAO I did not specify any type of preprocessing, do you know what is the default preprocessing to be able to replicate it in python?

Thank you very much,
best regards

Request you to share the model, script, profiler and performance output if not shared already so that we can help you better.
Alternatively, you can try running your model with trtexec command.

While measuring the model performance, make sure you consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
Please refer below link for more details:



I can share the code and the dataset with you, but could be possible to do it privately?

Also the first link you’ve shared is broken or at least I can’t see the page

Thank you in advance


We have noticed that you’re using an older version of the TensorRT. We recommend you please use the latest TRT version. Also looks like you’re using TAO. If you need further assistance, we recommend you please move this post to TAO forum to get better help.

Thank you.