Frame rate drops with more than 5 RTSP streams on a single GPU

muhammadfahad.zia · December 19, 2019, 12:08pm

I have been trying to benchmark several models on Deepstream on RTSP streams and the results indicate that I cannot run more than 5 real-time streams without a drop in frame rate which becomes significant as I increase the number of streams further.

The model I am using is the ResNet-10. Although the same is observed using the custom YOLO implementation provided with Deepstream.

With ResNet, FPS drops moving from 30 (near real time) with 1 RTSP stream down to 18 per stream when the number of streams is increased to 10. With YOLO, it drops from 30 to 8 as we increase streams from 1 to 10.

Here is the deepstream config file:

[application]
enable-perf-measurement=1
perf-measurement-interval-sec=2
kitti-track-output-dir=/nvme/test/metadata_fahad_rtsp_1

#gie-kitti-output-dir=streamscl

[tiled-display]
enable=0
rows=1
columns=1
width=1280
height=720
gpu-id=2
#(0): nvbuf-mem-default - Default memory allocated, specific to particular platform
#(1): nvbuf-mem-cuda-pinned - Allocate Pinned/Host cuda memory
#(2): nvbuf-mem-cuda-device - Allocate Device cuda memory
#(3): nvbuf-mem-cuda-unified - Allocate Unified cuda memory
#(4): nvbuf-mem-surface-array - Allocate Surface Array memory, applicable for Jetson
#(5): nvbuf-mem-handle - Allocate Surface Handle memory, applicable for Jetson
#(6): nvbuf-mem-system - Allocate Surface System memory, allocated using calloc
nvbuf-memory-type=0

[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI
type=4
#uri=file:/dfs/AutomationWorkspace/EncodedVideos/20191016-150001/camera16/cam16Concat_28fps.mp4
uri=rtsp://153.64.131.17/stream
gpu-id=2
# (0): memtype_device   - Memory type Device
# (1): memtype_pinned   - Memory type Host Pinned
# (2): memtype_unified  - Memory type Unified
cudadec-memtype=0

[sink0]
enable=1
type=1
#1=mp4 2=mkv
#1=h264 2=h265 3=mpeg4
## only SW mpeg4 is supported right now.
qos=0
sync=0
gpu-id=2
iframeinterval=10
output-file=/software/Video_Output_Fahad/Out_RTSP_0.mp4
container=1
codec=3
source-id=0

#end

[osd]
enable=1
gpu-id=2
border-width=1
text-size=20
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Arial
process-mode=1
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0

[streammux]
gpu-id=2
##Boolean property to inform muxer that sources are live
live-source=1
batch-size=4
##time out in usec, to wait after the first buffer is available
##to push the batch even if the complete batch is not formed
batched-push-timeout=1000
## Set muxer output width and height
width=1280
height=720
#num-surfaces-per-frame=31
##Enable to maintain aspect ratio wrt source, and allow black borders, works
##along with width, height properties
enable-padding=0
nvbuf-memory-type=0

# config-file property is mandatory for any gie section.
# Other properties are optional and if set will override the properties set in
# the infer config file.
[primary-gie]
enable=1
gpu-id=2
#model-engine-file=model_b4_int8.engine
labelfile-path=labels.txt
batch-size=4
#Required by the app for OSD, not a plugin property
bbox-border-color0=1;0;0;1
bbox-border-color1=0;1;1;1
bbox-border-color2=0;0;1;1
bbox-border-color3=0;1;0;1
interval=1
gie-unique-id=1
nvbuf-memory-type=0
config-file=config_infer_primary_yoloV3_Fahad.txt

[tracker]
enable=0
tracker-width=320
tracker-height=180
#ll-lib-file=/usr/local/deepstream/libnvds_mot_iou.so
#ll-lib-file=/opt/nvidia/deepstream/deepstream-4.0/lib/libnvds_mot_klt.so
#ll-lib-file=/usr/local/deepstream/libnvds_mot_klt.so
#ll-lib-file=/usr/local/deepstream/libnvds_tracker.so
ll-lib-file=/opt/nvidia/deepstream/deepstream-4.0/lib/libnvds_nvdcf.so
#ll-config-file required for IOU only
ll-config-file=/root/deepstream_sdk_v4.0_x86_64/samples/configs/deepstream-app/tracker_config.yml
#ll-config-file=iou_config.txt
gpu-id=2
enable-batch-process=1


[tests]
file-loop=0

And here is the inference config file:

[property]
net-scale-factor=1
#0=RGB, 1=BGR
model-color-format=0
custom-network-config=/root/deepstream_sdk_v4.0_x86_64/sources/objectDetector_Yolo/yolov3.cfg
model-file=/root/deepstream_sdk_v4.0_x86_64/sources/objectDetector_Yolo/yolo-obj_20000.weights
#model-engine-file=model_b1_int8.engine
labelfile-path=labels.txt
#int8-calib-file=yolov3-calibration.table.trt5.1
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
num-detected-classes=80
gie-unique-id=1
is-classifier=0
maintain-aspect-ratio=1
parse-bbox-func-name=NvDsInferParseCustomYoloV3
custom-lib-path=/root/deepstream_sdk_v4.0_x86_64/sources/objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so

These numbers do not match the claimed throughput on Deepstream. Is there a problem with my config files?

DaneLLL · December 20, 2019, 4:24am

Hi,
Looks like you use x86 PC with dGPU. Please provide information about your machine, Tesla P4 or GeForce 1080,…

Also does it happen in running 5 local video file sources?

mchi · December 20, 2019, 4:27am

What GPU are you running on?
And, can you use INT8 instead of FP16?

Thanks!

muhammadfahad.zia · December 20, 2019, 7:14am

We are using a Linux based x86_64 system with a Tesla V100 GPU.
And yes, it also happens when running local video file sources. The FPS drops from 70 per stream to 40 if we increase sources from 1 to 4 when running local video sources using Resnet-10.

INT8 leads to significant accuracy degradation so we want to stick to FP16 for now.

mchi · December 20, 2019, 10:28am

Regarding ResNet-10, do you refer to the resnet10 network - /opt/nvidia/deepstream/deepstream-4.0/samples/models/Primary_Detector/resnet10.prototxt ?

muhammadfahad.zia · December 23, 2019, 5:54am

Yes, exactly.

mchi · December 23, 2019, 7:50am

We don’t have v100 in hand, can you use TensorRT tool - trtexec to profile the resnet10.prototxt on your device?
Command is like:

$ trtexec --deploy=resnet10.prototxt --output=“conv2d_cov/Sigmoid” --batch=10 --fp16 --workspace=2048

you can find trtexec in TensorRT package.

Topic		Replies	Views
Deepstream FPS drops when i add more and more RTSP streams DeepStream SDK	20	2141	November 6, 2023
Deepstream SDK with multiple RTSP streams is inconsistent DeepStream SDK	3	417	October 12, 2021
Multi-RTSP FPS drop with python deepstream_test_3.py example DeepStream SDK rtsp , python	5	1502	November 16, 2021
Feedback on Issue with deepstream-test5 Handling 16 1080p RTSP Streams DeepStream SDK	21	586	February 5, 2024
FPS drops to 0.2 after some time in Deepstream 5.0 python app DeepStream SDK	5	1160	October 12, 2021
Performance drop when using multiple sources DeepStream SDK	27	1037	April 29, 2024
L4 GPU not getting more than 15FPS for 80 rtsp stream DeepStream SDK	24	205	September 3, 2024
Pixel distortions when the pipeline's FPS falls below the frame rate of the RTSP sources DeepStream SDK	7	345	May 14, 2024
Performance drop in multi source rtsp stream processing DeepStream SDK	10	1014	October 12, 2021
RTSP stream too slow DeepStream SDK rtsp , gstreamer , deepstream	4	79	December 23, 2024

Frame rate drops with more than 5 RTSP streams on a single GPU

Related topics