Inference performance of DS5.0 is lower than that of DS4.0?

Hi,
Env.
GPU:NVIDIA T4, Ubuntu 18.04, GStreamer 1.14.1, NVIDIA driver 440+, CUDA 10.2, TensorRT 7.0, Deepstream 5.0
Running same deepstream-app based on same hardware(T4), the inference performance of DS5.0 is lower than that of DS4.0.
And DS5.0 reaches bottleneck, while GPU and memery never reached 100%.
Is where any way to improve inference performance?
A part of config file:
[source0]
enable=1
type=4
uri=rtsp://192.168.170.65:554/xxx
num-sources=1
gpu-id=0

[sink0]
enable=1
type=4
sync=0
source-id=0
gpu-id=0
nvbuf-memory-type=0
rtsp-port=21000
udp-port=31000
bitrate=1200000
codec=1

[osd]
enable=1
gpu-id=0
border-width=1
text-size=15
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Serif
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0
process-mode=1

[streammux]
gpu-id=0
live-source=1
batch-size=10
batched-push-timeout=40000
width=1920
height=1080
enable-padding=0

[primary-gie]
enable=1
gpu-id=0
model-engine-file=model_b64_gpu0_int8.engine
labelfile-path=labels.txt
batch-size=10
bbox-border-color0=1;0;0;1
bbox-border-color1=0;1;1;1
bbox-border-color2=0;0;1;1
bbox-border-color3=0;1;0;1
gie-unique-id=1
nvbuf-memory-type=0
config-file=config_infer_primary_yoloV3.txt

Could you share your detailed steps and what you observed for your conclusion?

Thanks!

Same Hardware: T4
App: deepstream-app(YoloV3) + 10 rtsp cameras
In DS4.0


image

In DS5.0


Hi @Mr.Z
but from your screenshot, DS5,0 is running on Quadro P2000, while DS4.0 is running on T4.

Hi,
It was running on T4.

Got, looks the T4 were running on different clock, one is 1425MHz, the other is 1080MHz.

Could you run below commands to lock the GPU clock and test again?

$ sudo nvidia-smi -pm ENABLED -i    $T4_GPU_ID         // change  $T4_GPU_ID to the GPU id of T4, e.g. 0
$ sudo nvidia-smi -ac "5001,1590" -i   $T4_GPU_ID  
$ sudo nvidia-smi  -q -d CLOCK -i    $T4_GPU_ID           // check the clock setting

Thanks!

Hi,
The gpu’s graphics clock is changing between 300~1590, when running apps.
Above images are just snapshots.
image

could you lock the GPU clock and check the perf again?

Hi,
We have tested your suggestion, and same result.
image

And GPU clock is still changing when running apps.

Hi @Mr.Z
Since it’s hard for us to setup 10 RTSP streams, could you refer to the section - “The DeepStream application is running slowly.” in FAQ to measure the latency of the plugins and narrow down which plugin cause the latendy ?

The DeepStream application is running slowly.
•Solution1: One of the plugins in the pipeline may be running slowly.
You can measure the latency of each plugin in the pipeline to determine whether one of them is slow.
•To enable frame latency measurement, run this command on the console:
$ export NVDS_ENABLE_LATENCY_MEASUREMENT=1
•To enable latency for all plugins, run this command on the console:
$ export NVDS_ENABLE_COMPONENT_LATENCY_MEASUREMENT=1
... 

Thanks a lot!

Hi mchi,
we also meet same problems and have tested our config by your suggestion above. But there is no any progress.
Can you give us more advices? Thanks!

Can you give us more advices?

can you check my above suggesion to find out which compoent cause the delay?

Hi mchi,
The suggestions above are not helpful.
And I want to get the component latency but can’t get result.
I have post this problem is Cannot get latency measurement result but have not been solved yet.