Playing 25 parallel streams casusing issue

Hi,
On Jetson AGX Xavier, I have deepstream of version 5.1.
I am running a pipeline, where I am giving 25 same mp4 videos via filesrc and I am doing object detection and tracking and playing. But while playing all 25 videos, it becomes too sticky and many frames are being dropped.
Pipeline which I am running is

gst-launch-1.0
nvstreammux name=m batch-size=25 width=1920 height=1080 ! queue !
nvinfer config-file-path=/opt/nvidia/deepstream/deepstream- 5.1/sources/apps/sample_apps/deepstream-test1/dstest1_pgie_config.txt batch-size=25 interval=1 unique-id=1 ! queue !
nvtracker ll-lib-file=/opt/nvidia/deepstream/deepstream-5.1/lib/libnvds_mot_klt.so ! queue !
nvinfer config-file-path=/opt/nvidia/deepstream/deepstream-5.1/samples/configs/deepstream-app/config_infer_secondary_carcolor.txt batch-size=25 interval=1 unique-id=2 infer-on-gie-id=1 infer-on-class-ids=0 ! queue !
nvmultistreamtiler rows=5 columns=5 width=1920 height=1080 ! queue ! nvvideoconvert ! queue ! nvdsosd ! queue !
nvegltransform ! queue ! nveglglessink
filesrc location=/home/nvidia/gst-example/highway.mp4 ! qtdemux ! h264parse ! nvv4l2decoder ! m.sink_0
filesrc location=/home/nvidia/gst-example/highway.mp4 ! qtdemux ! h264parse ! nvv4l2decoder ! m.sink_1
.
.
.
filesrc location=/home/nvidia/gst-example/highway.mp4 ! qtdemux ! h264parse ! nvv4l2decoder ! m.sink_24

After executing this pipeline, the video being played is too slow and sticky.
So can anyne give suggestion on how can I get good performance without dropping any frames.

And while running the Gstreamer pipeline, how can I check the FPS?

Hi,
Please execute sudo nvpmodel -m 0 and sudo jetson_clocks to run Xavier in max performance mode. All power modes are listed in developer guide.

And please replace

... ! nveglglessink

with

... ! fpsdisplaysink text-overlay=0 video-sink=nveglglessink sync=0 -v

to show fps print.

Hi @DaneLLL ,
Thank you for the reply.
I have set max performance on the board, but still I am facing same problem.

Hi,
Probably the models are too heavy and dominate the performance. We have a sample config file for Xavier:

source30_1080p_dec_infer-resnet_tiled_display_int8.txt

Please modify it to apply your file sources and give it a try. See if running resnet10 can achieve target performance.

Hi,

Currently the config file I am using for pgie is dstest1_pgie_config.txt
It is already using resnet10 model files only.
I am sharing the property of pgie config file.

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-file=…/…/…/…/samples/models/Primary_Detector/resnet10.caffemodel
proto-file=…/…/…/…/samples/models/Primary_Detector/resnet10.prototxt
model-engine-file=…/…/…/…/samples/models/Primary_Detector/resnet10.caffemodel_b1_gpu0_int8.engine
labelfile-path=…/…/…/…/samples/models/Primary_Detector/labels.txt
int8-calib-file=…/…/…/…/samples/models/Primary_Detector/cal_trt.bin
force-implicit-batch-dim=1
batch-size=25
network-mode=1
num-detected-classes=4
interval=1
gie-unique-id=1
output-blob-names=conv2d_bbox;conv2d_cov/Sigmoid

Hi,
Are you able to run with only dstest1_pgie_config.txt?

Looks like you have two models dstest1_pgie_config.txt and config_infer_secondary_carcolor.txt, in the pipeline. Running only dstest1_pgie_config.txt is more close to source30_1080p_dec_infer-resnet_tiled_display_int8.txt.

Hi,
soory for the late reply.

I have tried with only one primary pgie. But there also fps is about 13.
And also for monitoring, we need to keep tracking, otherwise for every frame it will detect object again and again, and the video will not be good to watch.

Hi,
Are you able to run the default config file to achieve 30fps?

/opt/nvidia/deepstream/deepstream-5.1/samples/configs/deepstream-app$ deepstream-app -c source30_1080p_dec_infer-resnet_tiled_display_int8.txt
1 Like

Hi,
Yes, I am able to run this default config file and it is giving 30 FPS.

Hi,
Please replace

uri=file://../../streams/sample_1080p_h264.mp4

with

/home/nvidia/gst-example/highway.mp4

and give it a try.

We would suggest modify the config file step by step to clarify which steps causes performance drop. And then execute sudo tegrastats to check if either hardware engine is at full loading, impacting performance.

Hi,
I executed deepstream-app after changing the source to ‘highway.mp4’.
When it starts, for 8-10 seconds it will run at around 20 fps.
But after that, it will start dropping more frames, and it will run hardly at 1-2 fps. That time ‘sudo tegrastats’ is giving below

RAM 6998/15817MB (lfb 1747x4MB) SWAP 0/7908MB (cached 0MB) CPU [15%@2265,14%@2265,25%@2265,18%@2265,15%@2265,12%@2265,16%@2265,13%@2265] EMC_FREQ 33%@2133 GR3D_FREQ 46%@1377 NVDEC 1190 NVDEC1 1190 VIC_FREQ 98%@1036 APE 150 MTS fg 0% bg 7% AO@52.5C GPU@56.5C Tdiode@56.75C PMIC@50C AUX@51C CPU@54.5C thermal@54.3C Tboard@44C GPU 10607/7227 CPU 2306/1382 SOC 7071/3944 CV 0/0 VDDRQ 1844/1059 SYS5V 3139/2756

RAM 7004/15817MB (lfb 1747x4MB) SWAP 0/7908MB (cached 0MB) CPU [19%@2265,18%@2265,24%@2265,21%@2265,16%@2265,14%@2265,18%@2265,17%@2265] EMC_FREQ 33%@2133 GR3D_FREQ 73%@1377 NVDEC 1190 NVDEC1 1190 VIC_FREQ 99%@1036 APE 150 MTS fg 0% bg 9% AO@52.5C GPU@57C Tdiode@57C PMIC@50C AUX@51C CPU@55C thermal@53.55C Tboard@44C GPU 11063/7275 CPU 2919/1402 SOC 7222/3985 CV 0/0 VDDRQ 1843/1068 SYS5V 3180/2761

RAM 7004/15817MB (lfb 1747x4MB) SWAP 0/7908MB (cached 0MB) CPU [11%@2265,12%@2265,25%@2265,21%@2265,15%@2265,16%@2265,12%@2265,10%@2265] EMC_FREQ 33%@2133 GR3D_FREQ 0%@1377 NVDEC 1190 NVDEC1 1190 VIC_FREQ 96%@1036 APE 150 MTS fg 0% bg 7% AO@53C GPU@56.5C Tdiode@57.25C PMIC@50C AUX@51.5C CPU@55C thermal@53.85C Tboard@44C GPU 10453/7314 CPU 2460/1415 SOC 7071/4023 CV 0/0 VDDRQ 1844/1078 SYS5V 3139/2766

Hi,
Looks like the issue is specfic to the video file. sample_1080p_h264.mp4 is 1920x1080p30. Is highway.mp4 also 1920x1080p30?

Hi,
Yes, highway.mp4 is also of 1080p30 only.
Things is that, in that video, we have more objects. So for 8-10 sec it runs at 20 fps, and then it will starts dropping more framed.
But when I ran video having less objects, that time it was running at 30 fps properly.
So do we have any limit, that upto that number, it will detect properly, and if more number of objects are there then performance will degrade ?