Deepstream performance with 6 rtsp cameras

• Hardware Platform (Jetson / GPU) GPU (1080ti)
• DeepStream Version 5.0
• JetPack Version (valid for Jetson only)
• TensorRT Version 7.2.1
• NVIDIA GPU Driver Version (valid for GPU only) 450.102.04
• Issue Type( questions, new requirements, bugs) Question
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)

6 rtsp cameras with 6 batch-size in streamux in config file.

• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

I am using 6 rtsp cameras with deepstream-app.
I am using custom facedetector and all necessary things are taken care of.
Output is good.

problem is till 4 cameras, its running smooth but if i add 6 cameras with batch-size=6 and KLT tracker, i can observe in 4 streams output, the video runs smooth but freeze for say (400-500ms) then continue streaming. This occurs continuously at each 2-3 sec of play.
I have enabled the latency information of plugin and got the following output-

BATCH-NUM = 1104**
Source id = 4 Frame_num = 890 Frame latency = 403.126953 (ms)
Source id = 5 Frame_num = 911 Frame latency = 252.131104 (ms)
Source id = 0 Frame_num = 911 Frame latency = 131.786133 (ms)
Source id = 1 Frame_num = 907 Frame latency = 317.698975 (ms)
Source id = 3 Frame_num = 909 Frame latency = 279.772949 (ms)
**PERF: 20.08 (20.25) 20.08 (20.19) 20.03 (20.16) 20.08 (20.09) 20.08 (19.96) 20.08 (20.20)

BATCH-NUM = 1105**
Source id = 4 Frame_num = 891 Frame latency = 404.462891 (ms)
Source id = 5 Frame_num = 912 Frame latency = 218.992920 (ms)
Source id = 0 Frame_num = 912 Frame latency = 127.816895 (ms)
Source id = 1 Frame_num = 908 Frame latency = 323.374023 (ms)
Source id = 3 Frame_num = 910 Frame latency = 280.169922 (ms)
Source id = 2 Frame_num = 913 Frame latency = 309.821045 (ms)

BATCH-NUM = 1106**
Source id = 4 Frame_num = 892 Frame latency = 398.184082 (ms)
Source id = 5 Frame_num = 913 Frame latency = 224.169922 (ms)
Source id = 0 Frame_num = 913 Frame latency = 107.177979 (ms)
Source id = 1 Frame_num = 909 Frame latency = 319.852051 (ms)
Source id = 2 Frame_num = 914 Frame latency = 265.726074 (ms)

BATCH-NUM = 1107**
Source id = 4 Frame_num = 893 Frame latency = 407.282959 (ms)
Source id = 5 Frame_num = 914 Frame latency = 199.333008 (ms)
Source id = 1 Frame_num = 910 Frame latency = 317.399902 (ms)
Source id = 0 Frame_num = 914 Frame latency = 112.913818 (ms)
Source id = 3 Frame_num = 911 Frame latency = 325.961914 (ms)
Source id = 2 Frame_num = 915 Frame latency = 270.710938 (ms)

BATCH-NUM = 1108**
Source id = 4 Frame_num = 894 Frame latency = 414.575195 (ms)
Source id = 5 Frame_num = 915 Frame latency = 220.737061 (ms)
Source id = 1 Frame_num = 911 Frame latency = 329.460205 (ms)
Source id = 0 Frame_num = 915 Frame latency = 105.188232 (ms)
Source id = 3 Frame_num = 912 Frame latency = 294.232178 (ms)

BATCH-NUM = 1109**
Source id = 4 Frame_num = 895 Frame latency = 415.446045 (ms)
Source id = 5 Frame_num = 916 Frame latency = 198.602051 (ms)
Source id = 1 Frame_num = 912 Frame latency = 337.852051 (ms)
Source id = 3 Frame_num = 913 Frame latency = 301.219971 (ms)
Source id = 0 Frame_num = 916 Frame latency = 109.443115 (ms)

BATCH-NUM = 1110**
Source id = 4 Frame_num = 896 Frame latency = 415.557129 (ms)
Source id = 5 Frame_num = 917 Frame latency = 213.272217 (ms)
Source id = 1 Frame_num = 913 Frame latency = 317.322998 (ms)

BATCH-NUM = 1111**
Source id = 4 Frame_num = 897 Frame latency = 426.335205 (ms)
Source id = 5 Frame_num = 918 Frame latency = 183.387207 (ms)
Source id = 1 Frame_num = 914 Frame latency = 344.193115 (ms)
Source id = 0 Frame_num = 917 Frame latency = 124.072021 (ms)

BATCH-NUM = 1112**
Source id = 4 Frame_num = 898 Frame latency = 431.431885 (ms)
Source id = 1 Frame_num = 915 Frame latency = 358.134766 (ms)

BATCH-NUM = 1113**
Source id = 4 Frame_num = 899 Frame latency = 457.645020 (ms)
Source id = 1 Frame_num = 916 Frame latency = 352.624023 (ms)
Source id = 2 Frame_num = 916 Frame latency = 520.850098 (ms)

BATCH-NUM = 1114**
Source id = 4 Frame_num = 900 Frame latency = 466.188965 (ms)
Source id = 1 Frame_num = 917 Frame latency = 337.905029 (ms)
Source id = 2 Frame_num = 917 Frame latency = 533.125000 (ms)
Source id = 0 Frame_num = 918 Frame latency = 202.751953 (ms)

Still i can see perf is 20fps for all the cameras as cameras are giving me 20fps stream at (1024 bitrate).

I have played with batch-push-timeout parameter also.
live-source=1.
placed sync=0 in all sync groups.
codec=1

Is this HW decoder issue?

Also let me know if i used 6 sources then i need 6 sink groups also?
Thanks.

There could be alot of reasons, but from your logs i suspect your model needs optimized, you can also try downscaling the input to KLT.

Tell me, are you able to stream 6 cameras with resnet50 smoothly ?
If you write to a file instead of displaying it to the screen what is the latency like ?
What is your input size to streammux ? 1280x720, 1920x1080 ?

Hi,
streamux - 1920x1080

If i write to file, i get like-

BATCH-NUM = 0**
Batch meta not found for buffer 0x7fa878014680
BATCH-NUM = 1**
Batch meta not found for buffer 0x7fa878014790KLT Tracker Init

BATCH-NUM = 2**
Batch meta not found for buffer 0x7fa878014460
BATCH-NUM = 3**
Batch meta not found for buffer 0x7fa8b401ac60
BATCH-NUM = 4**
Batch meta not found for buffer 0x7fa878014680
BATCH-NUM = 5**
Batch meta not found for buffer 0x7fa8d4017090
BATCH-NUM = 6**
Batch meta not found for buffer 0x7fa8b401ac60
BATCH-NUM = 7**
Batch meta not found for buffer 0x7fa8f40894e0KLT Tracker Init

BATCH-NUM = 8**
Batch meta not found for buffer 0x7fa8b4011e40KLT Tracker Init

BATCH-NUM = 9**
Batch meta not found for buffer 0x7fa8b408fe10
BATCH-NUM = 10**
Batch meta not found for buffer 0x7fa8b401ac60
BATCH-NUM = 11**
Batch meta not found for buffer 0x7fa8c400d440KLT Tracker Init

With 6 cameras with resnet10(default primary detector), same observation is there.

Also sometime i can see glitchy images are there in arbitrary camera streams.

Hmm. You may be hitting the limitations of your 1080ti or the limitations of your network.

(Ive only worked with the jetsons, never ran DS on a desktop)

Try setting your streammux to the native resolution of your camera feeds.

EX: Your cameras are 1280x720 so set streammux to 1280x720.

When you upscale a camera feed via streammux it will use 200mb of GPU memory per feed. Which in your case is 1.2GB of memory that could be diverted to nvinfer for inferencing.

My camera feeds are also same like 1920x1080. So i think 200mb of GPU memory per feed may not be there.
Will try with lower resolution.

Also i tried with 9 cameras, batch-size=9.

Output of latency of each plugin is -
Comp name = nvv4l2decoder8 in_system_timestamp = 1613118422699.666016 out_system_timestamp = 1613118423022.933105 component latency= 323.267090
Comp name = src_bin_muxer source_id = 4 pad_index = 4 frame_num = 353 in_system_timestamp = 1613118423022.983887 out_system_timestamp = 1613118423194.083008 component_latency = 171.099121
Comp name = nvv4l2decoder4 in_system_timestamp = 1613118422949.429932 out_system_timestamp = 1613118423023.010986 component latency= 73.581055
Comp name = src_bin_muxer source_id = 5 pad_index = 5 frame_num = 362 in_system_timestamp = 1613118423023.085938 out_system_timestamp = 1613118423194.083008 component_latency = 170.997070
Comp name = nvv4l2decoder7 in_system_timestamp = 1613118422857.799072 out_system_timestamp = 1613118423022.325928 component latency= 164.526855
Comp name = src_bin_muxer source_id = 6 pad_index = 6 frame_num = 362 in_system_timestamp = 1613118423022.392090 out_system_timestamp = 1613118423194.083008 component_latency = 171.690918
Comp name = nvv4l2decoder3 in_system_timestamp = 1613118422699.558105 out_system_timestamp = 1613118423022.782959 component latency= 323.224854
Comp name = src_bin_muxer source_id = 7 pad_index = 7 frame_num = 418 in_system_timestamp = 1613118423022.996094 out_system_timestamp = 1613118423194.083008 component_latency = 171.086914
Comp name = nvv4l2decoder5 in_system_timestamp = 1613118423047.485107 out_system_timestamp = 1613118423052.783936 component latency= 5.298828
Comp name = src_bin_muxer source_id = 0 pad_index = 0 frame_num = 363 in_system_timestamp = 1613118423052.864014 out_system_timestamp = 1613118423194.083008 component_latency = 141.218994
Comp name = nvv4l2decoder1 in_system_timestamp = 1613118423147.846924 out_system_timestamp = 1613118423180.287109 component latency= 32.440186
Comp name = src_bin_muxer source_id = 8 pad_index = 8 frame_num = 373 in_system_timestamp = 1613118423180.388916 out_system_timestamp = 1613118423194.083008 component_latency = 13.694092
Comp name = nvv4l2decoder2 in_system_timestamp = 1613118422938.599121 out_system_timestamp = 1613118423183.983887 component latency= 245.384766
Comp name = src_bin_muxer source_id = 3 pad_index = 3 frame_num = 372 in_system_timestamp = 1613118423184.086914 out_system_timestamp = 1613118423194.083984 component_latency = 9.997070
Comp name = nvv4l2decoder0 in_system_timestamp = 1613118422948.499023 out_system_timestamp = 1613118423185.163086 component latency= 236.664062
Comp name = src_bin_muxer source_id = 2 pad_index = 2 frame_num = 374 in_system_timestamp = 1613118423185.229004 out_system_timestamp = 1613118423194.083984 component_latency = 8.854980
Comp name = nvv4l2decoder6 in_system_timestamp = 1613118423175.071045 out_system_timestamp = 1613118423193.851074 component latency= 18.780029
Comp name = src_bin_muxer source_id = 1 pad_index = 1 frame_num = 361 in_system_timestamp = 1613118423193.947021 out_system_timestamp = 1613118423194.083984 component_latency = 0.136963
Comp name = primary_gie in_system_timestamp = 1613118423194.184082 out_system_timestamp = 1613118423205.724121 component latency= 11.540039
Comp name = tracking_tracker in_system_timestamp = 1613118423207.571045 out_system_timestamp = 1613118423217.670898 component latency= 10.099854
Comp name = ivcore0 in_system_timestamp = 1613118423217.748047 out_system_timestamp = 0.000000 component latency= -1613118423217.748047
Comp name = tiled_display_tiler in_system_timestamp = 1613118423217.822021 out_system_timestamp = 1613118423219.040039 component latency= 1.218018
Comp name = osd_conv in_system_timestamp = 1613118423219.293945 out_system_timestamp = 1613118423219.355957 component latency= 0.062012
Comp name = nvosd0 in_system_timestamp = 1613118423219.430908 out_system_timestamp = 1613118423219.439941 component latency= 0.009033
Source id = 4 Frame_num = 353 Frame latency = 519.936035 (ms)
Source id = 5 Frame_num = 362 Frame latency = 270.172119 (ms)
Source id = 6 Frame_num = 362 Frame latency = 361.802979 (ms)
Source id = 7 Frame_num = 418 Frame latency = 520.043945 (ms)
Source id = 0 Frame_num = 363 Frame latency = 172.116943 (ms)
Source id = 8 Frame_num = 373 Frame latency = 71.755127 (ms)
Source id = 3 Frame_num = 372 Frame latency = 281.002930 (ms)
Source id = 2 Frame_num = 374 Frame latency = 271.103027 (ms)
Source id = 1 Frame_num = 361 Frame latency = 44.531006 (ms)

performance-

**PERF: FPS 0 (Avg) FPS 1 (Avg) FPS 2 (Avg) FPS 3 (Avg) FPS 4 (Avg) FPS 5 (Avg) FPS 6 (Avg) FPS 7 (Avg) FPS 8 (Avg)
**PERF: 20.24 (20.02) 19.75 (20.01) 19.75 (20.01) 18.76 (20.00) 20.24 (19.97) 20.24 (20.04) 21.23 (20.05) 23.70 (23.67) 20.22 (20.01)
**PERF: 17.88 (20.01) 18.38 (20.00) 19.37 (20.00) 19.37 (20.00) 17.88 (19.96) 17.39 (20.02) 16.89 (20.03) 22.85 (23.67) 19.86 (20.01)
**PERF: 20.79 (20.01) 20.28 (20.00) 19.77 (20.00) 20.28 (20.00) 21.29 (19.96) 21.80 (20.03) 22.31 (20.04) 23.32 (23.66) 19.32 (20.00)
**PERF: 20.18 (20.02) 20.67 (20.01) 21.16 (20.01) 20.67 (20.00) 19.69 (19.96) 19.69 (20.03) 18.70 (20.03) 24.12 (23.67) 20.58 (20.01)
**PERF: 20.67 (20.02) 20.67 (20.01) 20.16 (20.01) 20.67 (20.01) 20.67 (19.97) 20.67 (20.04) 22.68 (20.05) 23.69 (23.67) 19.77 (20.01)
**PERF: 20.06 (20.02) 20.06 (20.01) 20.06 (20.01) 20.06 (20.01) 20.06 (19.97) 20.06 (20.03) 17.55 (20.03) 24.07 (23.67) 20.51 (20.01)
**PERF: 20.91 (20.02) 20.41 (20.01) 19.91 (20.01) 19.91 (20.01) 20.91 (19.97) 20.91 (20.04) 22.90 (20.05) 23.89 (23.67) 19.97 (20.01)
**PERF: 18.18 (20.01) 18.69 (20.00) 19.70 (20.01) 19.19 (20.00) 18.69 (19.96) 18.18 (20.03) 16.16 (20.03) 20.23 (23.67) 19.70 (20.01)

Not sure if infer is overloading.

Only thing is i can observe that glitchiness and frame freezing(for 100ms) and also its random.

Please use “nvidia-smi dmon” to monitor your GPU loading and usage while running the case.

i can see the output is-

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    99    55     -    33     8     0    17  5005  1873
0   104    55     -    24     7     0    15  5005  1873
0    99    56     -    32     7     0    32  5005  1873
0   102    56     -    22     6     0    13  5005  1860
0   121    56     -    26     7     0    16  5005  1860
0    99    56     -    22     7     0    15  5005  1860
0   132    56     -    34     8     0    30  5005  1860
0    95    56     -    30     9     0    12  5005  1860
0   112    57     -    45    15     0    32  5005  1860
0    97    57     -    21     7     0    11  5005  1860
0   158    57     -    25     8     0    21  5005  1860
0    94    57     -    18     6     0    12  5005  1860
0   112    57     -    44    12     0    42  5005  1860

Is the rtsp sources latency stable and short enough? The GPU and decoder loading are not high. What is your cameras FPS?

all cameras are at 20fps (1080p). I have set those with constant bitrate.

Also when i increase the number of cameras from 9 to 10, fps goes out of sync and frames gets hanged/glitchy images.

I just wanted to know if model or hw is bottleneck.

According to the nvidia-smi result, the model and hw are not bottleneck.

i can share a video file , if you want

In your original post, you are using rtsp streams but not local files. The local file is quite different to real time stream. What on earth are you tested for the performance?

i meant to say output video(to see what is happening) file where frame freezing is happening and then continuing. RTSP is being used as input source.

The live source may be not smooth.

Ok Noted.
Didn’t know that rtsp stream can be like that.
But the issue is when i use 6 cameras then it runs smooth but when i add 9 cameras(keeping batch-size accordingly), then it happens and it appears randomly with any of the camera stream.