Deepstream 6.0 Python Yolo bad performance

• Hardware Platform (Jetson / GPU) Jetson Xavier NX
• DeepStream Version 6.0
• JetPack Version (valid for Jetson only) 4.6
• TensorRT Version 8.0.1-1

Hello everyone
I am trying to use yoloV3 model in deepstream_python_apps, but getting bad performance.

As Deepstream 6.0 had an issue for using Yolo, I applied the patch attached on this topic:
https://forums.developer.nvidia.com/t/deepstream-6-yolo-performance-issue/194238

After applying the patch, I got about 55 fps by running this example

/opt/nvidia/deepstream/deepstream-6.0/sources/objectDetector_Yolo/deepstream_app_config_yoloV3.txt

I also tested deepstream_app_config_yoloV3 by using two IP cameras as a input and got about 30~40 fps.

But after I used rebuilt libnvdsinfer_custom_impl_Yolo.so on python, the frame dropped to 3 fps.
I used the example code:

/opt/nvidia/deepstream/deepstream-6.0/sourcesdeepstream_python_apps/apps/deepstream-test3

and changed the files to use yoloV3 by following this step:

# cd  /opt/nvidia/deepstream/deepstream-5.0/sources/python/apps/deepstream-test3/
# cp ../../../objectDetector_Yolo/config_infer_primary_yoloV3.txt   dstest3_pgie_config.txt
# cp ../../../objectDetector_Yolo/yolov3.cfg ./
# cp ../../../objectDetector_Yolo/yolov3.weights  ./
# mkdir  nvdsinfer_custom_impl_Yolo
# cp ../../../objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so ./nvdsinfer_custom_impl_Yolo/
# cp ../../../objectDetector_Yolo/labels.txt ./
# python deepstream_test_3.py rtsp://127.0.0.1/video1 rtsp://127.0.0.1/video2

The result was like this:

Warning: gst-stream-error-quark: No decoder available for type 'application/x-rtp, media=(string)application, payload=(int)107, clock-rate=(int)90000, encoding-name=(string)VND.ONVIF.METADATA, a-recvonly=(string)"", clock-base=(uint)2215878722, seqnum-base=(uint)26690, npt-start=(guint64)0, play-speed=(double)1, play-scale=(double)1, ssrc=(uint)224613528'. (6): gsturidecodebin.c(921): unknown_type_cb (): /GstPipeline:pipeline0/GstBin:source-bin-00/GstURIDecodeBin:uri-decode-bin
Decodebin child added: nvv4l2decoder0 

Decodebin child added: nvv4l2decoder1 

Opening in BLOCKING MODE 
Opening in BLOCKING MODE 
NvMMLiteOpen : Block : BlockType = 261 
NvMMLiteOpen : Block : BlockType = 261 
NVMEDIA: Reading vendor.tegra.display-size : status: 6 
NVMEDIA: Reading vendor.tegra.display-size : status: 6 
NvMMLiteBlockCreate : Block : BlockType = 261 
NvMMLiteBlockCreate : Block : BlockType = 261 
In cb_newpad

gstname= video/x-raw
features= <Gst.CapsFeatures object at 0x7f83746048 (GstCapsFeatures at 0x7eb40d3520)>
In cb_newpad

gstname= video/x-raw
features= <Gst.CapsFeatures object at 0x7f83746048 (GstCapsFeatures at 0x7ea0075f20)>
**********************FPS*****************************************
Fps of stream 1 is  3.2
**********************FPS*****************************************
Fps of stream 0 is  3.2
**********************FPS*****************************************
Fps of stream 1 is  3.0
**********************FPS*****************************************
Fps of stream 0 is  3.0

Please help me to fix it.
Thank you so much!

check Deepstream 6 YOLO performance issue - #22 by mchi

Thx!

Hi sorry but I already applied that patch.
the problem is that the patch does not work on deepstream_python_apps

please try capture the deepstream latency log with “export NVDS_ENABLE_LATENCY_MEASUREMENT=1” refer to Troubleshooting — DeepStream 6.0 Release documentation

I am not really sure if this is the one you asked but this is the result of running deepstream python after “export NVDS_ENABLE_LATENCY_MEASUREMENT=1”

No, it should be log like the screenshot in topic Deepstream frame latency: frame number still stuck at zero (DS6, JP4.6)

or, can you try to use trtexec to run the tensorrt engine generated by DeepStream?

Hi @mchi
sorry but do “export NVDS_ENABLE_LATENCY_MEASUREMENT=1” work on deepstream python app?
I checked that NVDS_ENABLE_LATENCY_MEASUREMENT is set as 1, but cannot get any log as the screenshot from the link you attached.

Also, I used trtexec to run the tecnsorrt engine but I am not sure if I run it right.
This is the code I used:

./trtexec --loadEngine=/opt/nvidia/deepstream/deepstream-6.0/sources/objectDetector_Yolo/model_b2_gpu0_int8.engine --plugins=/opt/nvidia/deepstream/deepstream-6.0/sources/objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so

And this is the result of running trtexec:

1 Like

Hi @yoojin
Thanks for the trtexec log!
From the log, the pipeline should support ~40fps (1000 ms / ~25ms).

Sorry! Yes, “export NVDS_ENABLE_LATENCY_MEASUREMENT=1” does not work for python DS app,

Please refer to How to get the latency from deepstream python apps - #13 by Fiona.Chen to capture the latency plugin.

Thanks!

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.