Overall, I can reproduce this perf issue, it’s lower than expection, and the perf drop should be caused by the post-processor of YoloV3 post-processor
Here are my test:
1. tried your github code.
After only adding the rtsp link, e.g. rtsp://${RTSP_IP}/media/video1, it failed to run
2. tried below command
the showed fps is ~5, so the total fps = BATCH * 5 = 16 * 5 = 80 fps
gst-launch-1.0 -v rtspsrc location=rtsp://${RTSP_IP}/media/video1 latency=100 ! tee name=t \
t. ! queue ! rtph264depay ! nvv4l2decoder ! queue ! m.sink_0 \
t. ! queue ! rtph264depay ! nvv4l2decoder ! queue ! m.sink_1 \
t. ! queue ! rtph264depay ! nvv4l2decoder ! queue ! m.sink_2 \
t. ! queue ! rtph264depay ! nvv4l2decoder ! queue ! m.sink_3 \
t. ! queue ! rtph264depay ! nvv4l2decoder ! queue ! m.sink_4 \
t. ! queue ! rtph264depay ! nvv4l2decoder ! queue ! m.sink_5 \
t. ! queue ! rtph264depay ! nvv4l2decoder ! queue ! m.sink_6 \
t. ! queue ! rtph264depay ! nvv4l2decoder ! queue ! m.sink_7 \
t. ! queue ! rtph264depay ! nvv4l2decoder ! queue ! m.sink_8 \
t. ! queue ! rtph264depay ! nvv4l2decoder ! queue ! m.sink_9 \
t. ! queue ! rtph264depay ! nvv4l2decoder ! queue ! m.sink_10 \
t. ! queue ! rtph264depay ! nvv4l2decoder ! queue ! m.sink_11 \
t. ! queue ! rtph264depay ! nvv4l2decoder ! queue ! m.sink_12 \
t. ! queue ! rtph264depay ! nvv4l2decoder ! queue ! m.sink_13 \
t. ! queue ! rtph264depay ! nvv4l2decoder ! queue ! m.sink_14 \
t. ! queue ! rtph264depay ! nvv4l2decoder ! queue ! m.sink_15 \
nvstreammux name=m batch-size=$BATCH width=640 height=480 live-source=1 ! \
queue ! nvinfer config-file-path=./config_infer_primary_yoloV3.txt batch-size=$BATCH \
! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=false
3. Build a with main post-processing code commented
diff --git a/nvdsinfer_custom_impl_Yolo/nvdsparsebbox_Yolo.cpp b/nvdsinfer_custom_impl_Yolo/nvdsparsebbox_Yolo.cpp
index c286772..bc4e12a 100644
--- a/nvdsinfer_custom_impl_Yolo/nvdsparsebbox_Yolo.cpp
+++ b/nvdsinfer_custom_impl_Yolo/nvdsparsebbox_Yolo.cpp
@@ -279,9 +279,13 @@ extern "C" bool NvDsInferParseCustomYoloV3(
{6, 7, 8},
{3, 4, 5},
{0, 1, 2}};
+#if 0
return NvDsInferParseYoloV3 (
outputLayersInfo, networkInfo, detectionParams, objectList,
kANCHORS, kMASKS);
+#else
+ return true;
+#endif
}
then running command line in above 2), the showed fps in above command increased to 16, so the total fps increased to BATCH * 16 = 16 * 16 = 256 fps
this fps is almost the same as the fps calculated from below log.
fps = (1000 ms / 60.2764 ms) * batch = 16 * 16 = 256 fps
# /usr/src/tensorrt/bin/trtexec --loadEngine=model_b16_gpu0_int8.engine --plugins=./libnvdsinfer_custom_impl_Yolo.so --batch=16
...
[12/27/2021-11:04:44] [I] Average on 10 runs - GPU latency: 60.2679 ms - Host latency: 76.967 ms (end to end 120.168 ms, enqueue 1.22985 ms)
[12/27/2021-11:04:44] [I] Average on 10 runs - GPU latency: 59.0416 ms - Host latency: 75.6662 ms (end to end 117.91 ms, enqueue 1.20278 ms)
[12/27/2021-11:04:44] [I] Average on 10 runs - GPU latency: 60.2764 ms - Host latency: 77.0831 ms (end to end 120.211 ms, enqueue 1.19489 ms)
...