Yolo V3 slow

The FPS is extremely stable even when computed every 5 seconds. But I mean, even if it wasn’t stable, it would simply output the number of frames processed in those 5 seconds. If the application doesn’t work for 5 seconds, we should know it.
I also think it’s better to do it every 5 seconds rather than using the data since the application started. If you use the data since the application started, and your application stops after 5 hours, your FPS will never go down to 0.

Ok. it should have some deviation, but not much. also the TOT FPS is per stream, not total, so the total fps is 80+, i also used trtexec to benchmark yolov3 performance,
/usr/src/tensorrt/bin/trtexec --loadEngine=model_b4_gpu0_int8.engine --plugins=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so

[12/23/2021-08:22:32] [I] GPU Compute Time: min = 15.2706 ms, max = 16.0349 ms, mean = 15.5126 ms, median = 15.5229 ms, percentile(99%) = 15.9949 ms

so the max fps for yolov3 reach: 1000/(15.52/4)=258

Overall, I can reproduce this perf issue, it’s lower than expection, and the perf drop should be caused by the post-processor of YoloV3 post-processor

Here are my test:
1. tried your github code.
After only adding the rtsp link, e.g. rtsp://${RTSP_IP}/media/video1, it failed to run

2. tried below command
the showed fps is ~5, so the total fps = BATCH * 5 = 16 * 5 = 80 fps

gst-launch-1.0 -v rtspsrc location=rtsp://${RTSP_IP}/media/video1 latency=100 ! tee name=t \
        t. ! queue ! rtph264depay ! nvv4l2decoder ! queue ! m.sink_0 \
        t. ! queue ! rtph264depay ! nvv4l2decoder ! queue ! m.sink_1 \
        t. ! queue ! rtph264depay ! nvv4l2decoder ! queue ! m.sink_2 \
        t. ! queue ! rtph264depay ! nvv4l2decoder ! queue ! m.sink_3 \
        t. ! queue ! rtph264depay ! nvv4l2decoder ! queue ! m.sink_4 \
        t. ! queue ! rtph264depay ! nvv4l2decoder ! queue ! m.sink_5 \
        t. ! queue ! rtph264depay ! nvv4l2decoder ! queue ! m.sink_6 \
        t. ! queue ! rtph264depay ! nvv4l2decoder ! queue ! m.sink_7 \
        t. ! queue ! rtph264depay ! nvv4l2decoder ! queue ! m.sink_8 \
        t. ! queue ! rtph264depay ! nvv4l2decoder ! queue ! m.sink_9 \
        t. ! queue ! rtph264depay ! nvv4l2decoder ! queue ! m.sink_10 \
        t. ! queue ! rtph264depay ! nvv4l2decoder ! queue ! m.sink_11 \
        t. ! queue ! rtph264depay ! nvv4l2decoder ! queue ! m.sink_12 \
        t. ! queue ! rtph264depay ! nvv4l2decoder ! queue ! m.sink_13 \
        t. ! queue ! rtph264depay ! nvv4l2decoder ! queue ! m.sink_14 \
        t. ! queue ! rtph264depay ! nvv4l2decoder ! queue ! m.sink_15 \
        nvstreammux name=m batch-size=$BATCH width=640 height=480 live-source=1 ! \
        queue ! nvinfer config-file-path=./config_infer_primary_yoloV3.txt batch-size=$BATCH \
        ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=false

3. Build a with main post-processing code commented

diff --git a/nvdsinfer_custom_impl_Yolo/nvdsparsebbox_Yolo.cpp b/nvdsinfer_custom_impl_Yolo/nvdsparsebbox_Yolo.cpp
index c286772..bc4e12a 100644
--- a/nvdsinfer_custom_impl_Yolo/nvdsparsebbox_Yolo.cpp
+++ b/nvdsinfer_custom_impl_Yolo/nvdsparsebbox_Yolo.cpp
@@ -279,9 +279,13 @@ extern "C" bool NvDsInferParseCustomYoloV3(
         {6, 7, 8},
         {3, 4, 5},
         {0, 1, 2}};
+#if 0
     return NvDsInferParseYoloV3 (
         outputLayersInfo, networkInfo, detectionParams, objectList,
         kANCHORS, kMASKS);
+#else
+       return true;
+#endif
 }

then running command line in above 2), the showed fps in above command increased to 16, so the total fps increased to BATCH * 16 = 16 * 16 = 256 fps

this fps is almost the same as the fps calculated from below log.
fps = (1000 ms / 60.2764 ms) * batch = 16 * 16 = 256 fps

# /usr/src/tensorrt/bin/trtexec --loadEngine=model_b16_gpu0_int8.engine --plugins=./libnvdsinfer_custom_impl_Yolo.so --batch=16
...
[12/27/2021-11:04:44] [I] Average on 10 runs - GPU latency: 60.2679 ms - Host latency: 76.967 ms (end to end 120.168 ms, enqueue 1.22985 ms)
[12/27/2021-11:04:44] [I] Average on 10 runs - GPU latency: 59.0416 ms - Host latency: 75.6662 ms (end to end 117.91 ms, enqueue 1.20278 ms)
[12/27/2021-11:04:44] [I] Average on 10 runs - GPU latency: 60.2764 ms - Host latency: 77.0831 ms (end to end 120.211 ms, enqueue 1.19489 ms)
...

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.