I just created a TensorRT YoloV3 engine. However inference time does not make any significant difference. When I run the same model with PyTorch I get 20 FPS but TRT inference only yields around 10 FPS. I work on my notebook’s GTX 1050 Max-Q with CUDA 10. Only warning I got end to end (from converting yolo to engine, to inference) is ;
[TensorRT] WARNING: TensorRT was linked against cuDNN 7.6.3 but loaded cuDNN 7.4.2
I couldnt know what type of info should I add more so that’s all for now. I’ll be willing to add the more info on demand.
Can you provide the following information so we can better help?
Provide details on the platforms you are using:
o Linux distro and version
o GPU type
o Nvidia driver version
o CUDA version
o CUDNN version
o Python version [if using python]
o Tensorflow and PyTorch version
o TensorRT version
Also, if possible please share the script & model file to reproduce the issue.
And the scripts I used is from GitHub - penolove/yolov3-tensorrt .
Repo might not work right away when you clone but minor edits whitout the need of changing the core of it will help to get it working.
Hi, I just met a similar problem. I used python samples of tensorrt for yolov3, with onnx parser of version 1.4.1. I successfully run the sample program, and get 17 FPS on pure inference time (data → gpu + model inference) without postprocess. Then I switch to MXNet gluoncv’s version of yolov3, which is the same for darknet53_coco, and I also get around 17 FPS with only network inference. The postprocess of trt sample is too slow in python and will make its FPS down to 3, so I didn’t calculate it.
Here are my versions:
PC: Intel® Core™ i7-7700HQ CPU @ 2.80GHz × 8 + GeForce GTX 1060/PCIe/SSE2
Ubuntu 18.04
CUDA 10.1
Cudnn: 7.5.0
MXNet: 1.5.1
TensorRT: 5.1.5.0
yolov3 input size: 608 x 608
python: 3.6.8
data type: float32
Are there any suggestions? Or is it normal to have similar FPS? I already got over 2x speed up on simple resnet18 and resnet50 with test on trt with FP32.
TensorRT is a high performance neural network inference optimizer and runtime engine.
The pre and post-processing steps depend strongly on the particular application.