Batch-size has marginal affect on multi-source performance

cpeskin · July 26, 2021, 8:03pm

• Hardware Platform (Jetson / GPU) - Jetson TX2
• DeepStream Version - 5.0
• JetPack Version (valid for Jetson only) - 4.4, L4T 32.4.3
• TensorRT Version - 7.1.3
• Issue Type( questions, new requirements, bugs) - question

Hello,

I am currently working to benchmark multi-source performance of our Deepstream 5.0 application on a Jetson TX2 device. Per the Deepstream best practices, I can see that we should be setting the batch-size parameter equal to the number of input sources to increase performance. We’re using the base pruned trafficcamnet model with RTSP inputs (20fps) and are setting the batch-size parameter in [streammux] and [primary-gie] groups and are seeing the following results:

Batch Size	# of streams	average FPS (0)	average FPS (1)	average FPS (2)	average FPS (3)
1	1	19.97
1	2	17.19	17.17
1	3	11.13	11.09	11.04
1	4	8.19	8.13	8.05	8.01
2	1	19.96
2	2	18.95	18.95
2	3	12.47	12.44	12.39
2	4	9.2	9.15	9.09	9.07
3	1	19.95
3	2	17.34	17.27
3	3	12.96	12.96	12.96
3	4	9.64	9.62	9.55	9.54
4	1	19.57
4	2	17.74	17.74
4	3	12.07	12.06	12.06
4	4	9.76	9.76	9.76	9.76

We are seeing some performance increase when matching the batch size to the number of input streams, but this increase is not very significant. This is running inside of a Docker container with all sinks set to fakesink.

My question is: is this marginal performance increase expected, or am I missing something and not taking full advantage of batching?

bcao · July 27, 2021, 10:38am

What’s the streammux batch size in your table, is it the same as pgie?

also can you test the model fps via trtexec?

cpeskin · July 27, 2021, 2:00pm

Hi @bcao , thanks for the response!

Yes, the batch size column in the table above represents the batch size set in both the streammux and the pgie groups.

Regarding testing the model using trtexec, I’m not seeing any option for tlt models (as I am using trafficcamnet) and per this post, trtexec doesn’t support tlt/etlt models.

Is there any other tools that we could use to benchmark the model outside of our application?
What is the expected performance increase when adjusting batch-size to match the number of input sources? Can you confirm whether or not the small performance increase we’re seeing is expected, or if there should be a greater difference?

bcao · August 3, 2021, 8:43am

I rechecked your table , it should be expected since your source stream is 20 fps, which part do you think is not expected?

cpeskin · August 5, 2021, 1:54pm

Hi @bcao,

We thought that increasing the batch-size (specifically to match the number of streams) would result in a greater performance increase than 1-2FPS. I do see that the performance (FPS) for each number of streams in my table above (1-4) is greatest when batch size == number of streams, but I’d expect batching to have a greater effect on performance.

However, if you think that 1-2FPS is the expected performance increase when changing batch-size then I can try to explore other ways to increase the multi-stream performance of our Deepstream app.
Do you have any recommendations to increase pipeline FPS outside of what’s explicitly mentioned in Deepstream best practices?

bcao · August 13, 2021, 7:32am

pelase move to latest version of DS.
Increasing batch size will not help as probably GPU is saturated - user can check it via tegrastats utility. Try enabling max clock setting. Perf will be model dependent also. checked perf numbers we have reported - trafficcamnet - https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_Performance.html.
also can you please share your config files for us to check

Topic		Replies	Views
Batch RTSP streaming DeepStream SDK tensorrt , gstreamer	13	841	October 12, 2021
Neeed clarity for batch-size and batched-push-timeout for rtsp source DeepStream SDK hw , cuda , gstreamer , nvbugs	5	1585	October 12, 2021
Deepstream FPS drops when i add more and more RTSP streams DeepStream SDK	20	2005	November 6, 2023
Stream Performance Drops When Number of Streams Less than Batch Size DeepStream SDK gstreamer	3	303	April 8, 2024
Batch size adjustment DeepStream SDK	5	146	July 2, 2024
Deepstream 5.0 process RTSP streams DeepStream SDK	8	1192	October 12, 2021
Why the fps is not crossing 35 even though free GPU space available DeepStream SDK	16	746	October 12, 2021
Deepstreamer Pipeline: Optimisation GPU Utilisation DeepStream SDK gstreamer , fps , deepstream	22	94	December 12, 2024
Performance drop when using multiple sources DeepStream SDK	27	914	April 29, 2024
Deepstream Multistream slower than single stream DeepStream SDK	8	118	July 9, 2024

Batch-size has marginal affect on multi-source performance

Related topics