Deepstream 6 YOLO performance issue

I have just downloaded deepstream 6 and I am having some performance issues running the YOLO example
I have been trying to run the YOLO example app in deepstream 6 and the video performance is extremely slow.
When I run the example on my Xavier NX with deepstream 5.0 it runs around 56fps
The same file running on same NX with deepstream 6.0 runs around 6 fps
Here is a video to explain:

Hi! @adventuredaisy , your issue is very weird. I use ds6.0 running Yolov3 custom model very normally, I test that my 20 class custom Yolov3 fps is about 15,Yolov3-tiny will get higher fps!

I suggest you use the Jetpack4.6 version, and modify your main config file for deepstream-app, I ever notice a low fps case in a bad RTSP pipeline. many plugins can cause low fps, you can share your pipeline or your main config!

I’m using ds6.0 for my company, I’m willing to hear your progress!

I am running the sample apps that come with deepstream:
I have not made any modifications to it.

When I run this example on my Xavier NX using jetpack 4.6 and Deepstream 5.0 I get 58 fps

/opt/nvidia/deepstream/deepstream-5.0/sources/oblectDetector_Yolo

But when I run this example on the same Xavier NX using jetpack 4.6 and deepstream 6.0 I can only get 6 fps

/opt/nvidia/deepstream/deepstream-6.0/sources/oblectDetector_Yolo

I have worked with the Yolo model using deepstream since deepstream first came out.
I have always achieved excellent performance from Deepstream and the YOLO applications.

That is why I find this odd. I have checked on every thing I could think of to see what is throttling the performance
and the only thing I can find different is the use of Deepstream 6.

Thanks, @nvplayer !

@adventuredaisy ,
Could you share the output of
$ cat /etc/nv_tegra_release

did you boost the clock ?
$ sudo nvpmodel -m 0
$ sudo jetson_clocks

And, please share the output of “sudo tegrastats” when the issue is reprodcuing.

And, how about the fps with command
$ /usr/src/tensorrt/bin/trtexec --loadEngine=$TRT_ENGINE_GENERATED_IN_DeepStream

Thanks!

mchi
attached is screen shot with

$ cat /etc/nv_tegra_release

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

results

sudo nvpmodel -m 0 is giving this result

nx@nx-desktop:~$ sudo nvpmodel -m 0
NVPM WARN: patching tpc_pg_mask: (0x1:0x4)
NVPM WARN: patched tpc_pg_mask: 0x4

will work on “sudo tegrastats” info

RAM 6243/7773MB (lfb 160x4MB) SWAP 44/3887MB (cached 0MB) CPU [25%@1190,43%@1190,off,off,off,off] EMC_FREQ 60%@1600 GR3D_FREQ 99%@1109 NVDEC 665 NVDEC1 665 VIC_FREQ 0%@115 APE 150 MTS fg 1% bg 5% AO@38C GPU@39.5C PMIC@50C AUX@36C CPU@38C thermal@37.5C VDD_IN 15336/15336 VDD_CPU_GPU_CV 8370/8370 VDD_SOC 2164/2164
RAM 6243/7773MB (lfb 160x4MB) SWAP 44/3887MB (cached 0MB) CPU [27%@1900,34%@1903,off,off,off,off] EMC_FREQ 61%@1600 GR3D_FREQ 11%@1109 NVDEC 665 NVDEC1 665 VIC_FREQ 38%@115 APE 150 MTS fg 1% bg 6% AO@38.5C GPU@39C PMIC@50C AUX@36C CPU@38C thermal@38C VDD_IN 14804/15070 VDD_CPU_GPU_CV 7880/8125 VDD_SOC 2123/2143
RAM 6243/7773MB (lfb 160x4MB) SWAP 44/3887MB (cached 0MB) CPU [26%@1651,41%@1497,off,off,off,off] EMC_FREQ 63%@1600 GR3D_FREQ 99%@1109 NVDEC 665 NVDEC1 665 VIC_FREQ 0%@115 APE 150 MTS fg 1% bg 8% AO@38.5C GPU@40C PMIC@50C AUX@36.5C CPU@38C thermal@38.15C VDD_IN 15336/15158 VDD_CPU_GPU_CV 8248/8166 VDD_SOC 2204/2163
RAM 6243/7773MB (lfb 160x4MB) SWAP 44/3887MB (cached 0MB) CPU [28%@1904,32%@1904,off,off,off,off] EMC_FREQ 63%@1600 GR3D_FREQ 94%@1109 NVDEC 665 NVDEC1 665 VIC_FREQ 87%@115 APE 150 MTS fg 1% bg 6% AO@39C GPU@40.5C PMIC@50C AUX@36.5C CPU@38C thermal@38.15C VDD_IN 15090/15141 VDD_CPU_GPU_CV 8125/8155 VDD_SOC 2123/2153
RAM 6243/7773MB (lfb 160x4MB) SWAP 44/3887MB (cached 0MB) CPU [40%@1903,31%@1904,off,off,off,off] EMC_FREQ 64%@1600 GR3D_FREQ 99%@1109 NVDEC 665 NVDEC1 665 VIC_FREQ 0%@115 APE 150 MTS fg 1% bg 6% AO@39C GPU@40.5C PMIC@50C AUX@36.5C CPU@38.5C thermal@38.2C VDD_IN 15376/15188 VDD_CPU_GPU_CV 8288/8182 VDD_SOC 2204/2163
RAM 6243/7773MB (lfb 160x4MB) SWAP 44/3887MB (cached 0MB) CPU [25%@1190,37%@1190,off,off,off,off] EMC_FREQ 63%@1600 GR3D_FREQ 99%@1109 NVDEC 665 NVDEC1 665 VIC_FREQ 0%@115 APE 150 MTS fg 1% bg 8% AO@39C GPU@40.5C PMIC@50C AUX@37C CPU@38.5C thermal@38.15C VDD_IN 15336/15213 VDD_CPU_GPU_CV 8329/8206 VDD_SOC 2164/2163
RAM 6243/7773MB (lfb 160x4MB) SWAP 44/3887MB (cached 0MB) CPU [33%@1904,34%@1903,off,off,off,off] EMC_FREQ 63%@1600 GR3D_FREQ 64%@1109 NVDEC 665 NVDEC1 665 VIC_FREQ 2%@115 APE 150 MTS fg 0% bg 0% AO@39C GPU@40.5C PMIC@50C AUX@37C CPU@39C thermal@38.5C VDD_IN 14927/15172 VDD_CPU_GPU_CV 8003/8177 VDD_SOC 2123/2157
RAM 6243/7773MB (lfb 160x4MB) SWAP 44/3887MB (cached 0MB) CPU [42%@1190,21%@1190,off,off,off,off] EMC_FREQ 64%@1600 GR3D_FREQ 99%@1109 NVDEC 665 NVDEC1 665 VIC_FREQ 0%@115 APE 150 MTS fg 1% bg 4% AO@39.5C GPU@41C PMIC@50C AUX@37C CPU@39C thermal@38.5C VDD_IN 15336/15192 VDD_CPU_GPU_CV 8288/8191 VDD_SOC 2204/2163
RAM 6243/7773MB (lfb 160x4MB) SWAP 44/3887MB (cached 0MB) CPU [34%@1903,36%@1904,off,off,off,off] EMC_FREQ

I tried DS 6.0 GA on Jetson-NX/Jetpack4.6, its perf is about 14 fps as below.
I’ll check DS5.1 again.

root@nvidia-desktop:/opt/nvidia/deepstream/deepstream/sources/objectDetector_Yolo# deepstream-app -c deepstream_app_config_yoloV3.txt

NvMMLiteOpen : Block : BlockType = 261
NVMEDIA: Reading vendor.tegra.display-size : status: 6
NvMMLiteBlockCreate : Block : BlockType = 261
** INFO: <bus_callback:180>: Pipeline running

**PERF: 16.61 (15.40)
**PERF: 14.07 (14.51)
**PERF: 14.19 (14.46)
**PERF: 14.49 (14.38)
**PERF: 14.24 (14.39)
**PERF: 13.46 (14.24)
**PERF: 14.56 (14.23)
**PERF: 14.48 (14.31)
**PERF: 13.85 (14.20)
**PERF: 14.31 (14.24)
**PERF: 14.00 (14.24)
**PERF: 14.35 (14.24)
**PERF: 14.15 (14.20)
**PERF: 13.88 (14.20)
**PERF: 14.26 (14.19)
**PERF: 14.10 (14.20)

mchi
Here is a video comparing jetpack 4.6 and deepstream 6.0
With jetpack 4.6 and deepstream 5.1.
jetpack 4.6 and deepstream 5.1. wil run at 58 fps using same example

https://youtu.be/OLp9yxe0DTY

I can reproduce this issue. We are checking it, will get back to you ASAP.

Thanks!

I know you guys are busy
But
Any updates?

Hi @adventuredaisy ,
We are checking this with priority.
So far, we have found the inference time of some layers are much longer on DS6.0GA than DS5.1, we are working for the fix.

Thanks for the update

I know you guys will win the day!

1 Like

How’s the fix coming?
Is there a timetable for when It will be rolled out.?

Hey guys
I know you are working on the issue.
But I am dead in the water with DS6 when it comes to updating my YOLO projects from DS5.1 to Deepstream 6.
Just wondering when a fix would be coming out.

Thanks
Joe Valdivia

Noted! Sorry! Still wroking on it… will get back to you ASAP

Hi @adventuredaisy ,
This issue is still under debugging, it may be related to the nvdsinfer_custom_impl_Yolo/trt_utils.cpp which build the TensorRT model from the cfg file.
If this is urgent for you, is it possible for you to try TAO Yolov3 network - GitHub - NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream ?

Thanks for the update.
I’m not in any bind at the moment.
I thought I would explore Omniverse Issac sim in the meantime.
Kind of excited about the Synthetic data generator that’s coming out.

@mchi Any update on the issue?

Hi All,
Sorry for long delay!

Attached the fix for this perf regression issue. Verified on my side.

$ cd /opt/nvidia/deepstream/deepstream-6.0/sources/objectDetector_Yolo/
$ patch -p1 < DS6.0GA_objectDetector_Yolo_perf_regression.patch
$ export CUDA_VER= // specify the CUDA version, e.g. export CUDA_VER=11.4
$ make -C nvdsinfer_custom_impl_Yolo

DS6.0GA_objectDetector_Yolo_perf_regression.patch (2.5 KB)

Thanks!

1 Like

how do I apply this.
I tried this:

/opt/nvidia/deepstream/deepstream-6.0/sources/objectDetector_Yolo$ sudo git apply /home/nx/DS6.0_objectDetector_Yolo_perf_regression.patch

but it returned this:

warning: nvdsinfer_custom_impl_Yolo/nvdsinfer_yolo_engine.cpp has type 100755, expected 100644
error: cannot apply binary patch to ‘nvdsinfer_custom_impl_Yolo/nvdsinfer_yolo_engine.o’ without full index line
error: nvdsinfer_custom_impl_Yolo/nvdsinfer_yolo_engine.o: patch does not apply
warning: nvdsinfer_custom_impl_Yolo/yolo.cpp has type 100755, expected 100644
warning: nvdsinfer_custom_impl_Yolo/yolo.h has type 100755, expected 100644
error: cannot apply binary patch to ‘nvdsinfer_custom_impl_Yolo/yolo.o’ without full index line
error: nvdsinfer_custom_impl_Yolo/yolo.o: patch does not apply