Yolov2 could only run up to 4.5 fps using TensorRT

I have built a YOLOV2 net using TensorRT 3.0 in TX2 with some plugin layers which are reorg, region, concat and PReLU, where concat layer is as the same function of route layer in darknet, and it can get the right result. While I run the net in Jetson-inference detectnet-camera, it could only run up to 4.5fps compared with 37.5fps using facenet-120 of Jetson-Inference.
I only add PluginFactory as:


, and build engine as

nvinfer1::ICudaEngine* engine = infer->deserializeCudaEngine(modelMem, modelSize, &pluginFactory);

, and these are the main differences with Jetson-Inference source code. At the same time, it runs at the same speed of forward computing using fp16 and fp32.
Could someone give me some suggestions? Thank you in advance!


1. Plugin API doesn’t have fp16 support. An automatically conversion will be applied in TensorRT.
2. There is a slightly stall before running plugin implementation.

It’s recommended to use NVVP to profile your application. It can help you find the bottleneck layer first.


fujiaweigege, I am trying similar effort of running yoloV2 using tensorRT. I am new to tensorRT and would greatly appreciate if you can share the plugin layers you implemneted for yolov2?