TensorRT Inference is Slower Than Other Frameworks

I just created a TensorRT YoloV3 engine. However inference time does not make any significant difference. When I run the same model with PyTorch I get 20 FPS but TRT inference only yields around 10 FPS. I work on my notebook’s GTX 1050 Max-Q with CUDA 10. Only warning I got end to end (from converting yolo to engine, to inference) is ;

[TensorRT] WARNING: TensorRT was linked against cuDNN 7.6.3 but loaded cuDNN 7.4.2

I couldnt know what type of info should I add more so that’s all for now. I’ll be willing to add the more info on demand.


Can you provide the following information so we can better help?
Provide details on the platforms you are using:
o Linux distro and version
o GPU type
o Nvidia driver version
o CUDA version
o CUDNN version
o Python version [if using python]
o Tensorflow and PyTorch version
o TensorRT version

Also, if possible please share the script & model file to reproduce the issue.


Ubuntu 18.04.3 LTS
GeForce GTX 1050 with Max-Q Design/PCIe/SSE2 Driver Version 440.26
CUDNN 7.4.2
Python 3.6.8
Tensorflow-GPU 1.14.0
Torch 1.3.1
Torchvision 0.4.2

Model is from https://github.com/pjreddie/darknet/blob/master/cfg/yolov3.cfg

And the scripts I used is from https://github.com/penolove/yolov3-tensorrt .
Repo might not work right away when you clone but minor edits whitout the need of changing the core of it will help to get it working.

Couple of recommendations:

  1. Warning seems to be due to older version of cuDNN. Could you please try upgrading the cuDNN version to 7.6.5?
        Please refer to below support matrix:
  2. ONNX Parser isn’t currently compatible with the ONNX models exported from Pytorch 1.3 - If you downgrade to Pytorch 1.2, this issue should go away.


Hi, I just met a similar problem. I used python samples of tensorrt for yolov3, with onnx parser of version 1.4.1. I successfully run the sample program, and get 17 FPS on pure inference time (data -> gpu + model inference) without postprocess. Then I switch to MXNet gluoncv’s version of yolov3, which is the same for darknet53_coco, and I also get around 17 FPS with only network inference. The postprocess of trt sample is too slow in python and will make its FPS down to 3, so I didn’t calculate it.

Here are my versions:
PC: Intel® Core™ i7-7700HQ CPU @ 2.80GHz × 8 + GeForce GTX 1060/PCIe/SSE2
Ubuntu 18.04
CUDA 10.1
Cudnn: 7.5.0
MXNet: 1.5.1
yolov3 input size: 608 x 608
python: 3.6.8
data type: float32

Are there any suggestions? Or is it normal to have similar FPS? I already got over 2x speed up on simple resnet18 and resnet50 with test on trt with FP32.


TensorRT is a high performance neural network inference optimizer and runtime engine.
The pre and post-processing steps depend strongly on the particular application.

Please refer to below link for optimizing python performance.


Thanks for the fast reply, while what I am asking is, is that normal to have similar FPS?

Please be aware that I didn’t consider pre or post process AT ALL in time estimate. Also, it works well in simple Resnet inference with good speed up.


Can you share the model file and script to reproduce this issue so we can better help?
Meanwhile, please try to use the latest supported TRT version.