The first inference using tensorRT model takes far longer time than that using tensorflow model


When using tensorRT, the first inference time is 126073.25 ms. The inference time is about 25 ms from the second inference.
When using tensorflow, the first inference time is 2292.62 ms. The inference time is about 35 ms from the second inference.

I attached the 3 log files in the below link - tf_log.txt, trt_log.txt, and trt_log_trimmed.txt.
The trt_log_trimmed.txt is a part of the trt_log.txt. I trimmed the contents to compare the inference time with tf_log.

I think the first inference time is too long. So, I wonder how to reduce the first inference time using tensorRT model.

My settings is as below:


TensorRT Version:
libnvinfer-dev: 6.0.1-1+cuda10.1
libnvinfer-plugin6: 6.0.1-1+cuda10.1
libnvinfer5: 5.1.5-1+cuda10.0
libnvinfer6: 6.0.1-1+cuda10.1

GPU Type: 2080ti
Nvidia Driver Version: 455.23.05
CUDA Version: 10.1.243
CUDNN Version: 7.6.5
Operating System + Version: Ubuntu 16.04
Python Version (if applicable): 3.8.6
TensorFlow Version (if applicable): tensorflow-gpu==2.3.0rc0
PyTorch Version (if applicable): none
Baremetal or Container (if container which image + tag): none

Relevant Files

See the files from the link below.

Steps To Reproduce

  1. Download the code from
  2. Using tensorflow model: python3 --framework tf
  3. Using tensor RT model: python3 --framework trt