Inference time of tensorrt 6.3 is slower than tensorrt 6.0

Description

Hi,
We measured inference time with our original yolov3 model but the inference time of TensorRT 6.3 is slower than TensorRT 6.0.
TensorRT 6.0 : average 11ms
TensorRT 6.3 : average 30ms

We can also see the same problem using sample program(sampleOnnxMNIST) which is included in TensorRT package.
Please refer to the attatched profile results.
What should we do to make as fast as TensorRT 6.0?

Profile result of TensorRT 6.0
profileresult_trt_6_0.qdrep (827.2 KB)

Profile result of TensorRT 6.3
profileresult_trt_6_3.qdrep (925.3 KB)

Environment

Envoronment for TensorRT 6.0
TensorRT Version: ver6.0.1.8(official release)
GPU Type: GTX1080
Nvidia Driver Version: 440.118.02
CUDA Version: 10.2
CUDNN Version: 7.6.5.32
Operating System + Version: Ubuntu18.04
Python Version (if applicable): 3.6.9
Baremetal or Container (if container which image + tag): 19.12-tf1-py3

Envoronment for TensorRT 6.3
TensorRT Version: ver6.3.1(included in DriveOS 5.2)
GPU Type: GTX1080
Nvidia Driver Version: 440.118.02
CUDA Version: 10.2
CUDNN Version: 7.6.6.184-1
Operating System + Version: Ubuntu18.04
Python Version (if applicable): 3.6.9

Relevant Files

profileresult_trt_6_0.qdrep (827.2 KB)
profileresult_trt_6_3.qdrep (925.3 KB)

Steps To Reproduce

Just execute sampleOnnxMnist which is included in each TensorRT package.

Hi,
Request you to share the model, script, profiler and performance output if not shared already so that we can help you better.
Alternatively, you can try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

While measuring the model performance, make sure you consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
Please refer below link for more details:
https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-722/best-practices/index.html#measure-performance

Thanks!

Hi @NVES

We use your sample code and model so that you can reproduce.

We modified sampleOnnxMNIST.cpp to measure inference time.
Modified files are below.

sampleOnnxMNIST_trt60.cpp (13.3 KB)
sampleOnnxMNIST_trt63.cpp (13.7 KB)

And I’ve already attached profiler output into the first post.(profileresult_trt_6_0.qdrep and profileresult_trt_6_3.qdrep]
Please refer to them.

We’ll try running trtexec while waiting for your investigation.

Thanks

Hi @eri.kasamatsu,

We recommend you to try on latest TensorRT version (official release). Please let us know if you still face this issue.

Thank you.

Hi @spolisetty

We’ll port our model to Drive AGX (DriveOS 5.2) soon.
That’s why we use TensorRT 6.3 not TensorRT 7.

Thanks

Hi

I’ve run sampleOnnxMNIST with trtexec and attached the results.
You can see the layer time total runtime of TesorRT 6.3 is much slower than TensorRT 6.0.

trtexec_trt6_0.txt (5.1 KB)

trtexec_trt6_3.txt (6.2 KB)

Furthermore, layers are so different between TensorRT 6.0 and TensorRT 6.3.
Does it cause performance degradation?

Thanks

Hi @eri.kasamatsu,

Sorry for the delayed response. 6.0 uses implicit batch and 6.3 use explicit batch for onnx parser. There had been some issues in parser which are resolved in further versions.

Thank you.