YOLOv3 TensorRT Inference Super Slow In Nano

I am experimenting on my jetson nano with TensorRT Python API. I am using this repo https://github.com/penolove/yolov3-tensorrt for all operations necessary to run Yolov3 on TRT, i added

builder.fp16_mode=True

into engine creation block.

I have Jetpack 4.2.2 on my Nano. I get 0.5sec per frame inference time on TRT inference while getting 0.38sec per frame when using pure pytorch(v1.3) and torchvision(v0.5) implemantation of YOLO. I wasnt expecting 30FPS inference of course but this so dissappointing.

P.S I timed only the forward passes in both inference scripts.

Hi,

Sorry for the late reply.

Would you mind to try our deepstream YOLOv3 sample instead.
You can find it in /opt/nvidia/deepstream/deepstream-4.0/sources/objectDetector_Yolo/

Here is a change to update network size into 416 and can achieve 20fps on Nano:
https://devtalk.nvidia.com/default/topic/1064871/deepstream-sdk/deepstream-gst-nvstreammux-change-width-and-height-doesn-t-affect-fps/post/5392823/#5392823

Thanks.

I tested yolov3-608, yolov3-416 and yolov3-288 on Jetson Nano with JetPack-4.3 (TensorRT 6). The results are as follows. Please refer to my GitHub repository (Demo #4) for more details: GitHub - jkjung-avt/tensorrt_demos: TensorRT MODNet, YOLOv4, YOLOv3, SSD, MTCNN, and GoogLeNet

External Media