• Hardware Platform (Jetson / GPU) - Jetson TX2 • DeepStream Version - 5.0 • JetPack Version (valid for Jetson only) - 4.4, L4T 32.4.3 • TensorRT Version - 7.1.3 • Issue Type( questions, new requirements, bugs) - question
Hello,
I am currently working to benchmark multi-source performance of our Deepstream 5.0 application on a Jetson TX2 device. Per the Deepstream best practices, I can see that we should be setting the batch-size parameter equal to the number of input sources to increase performance. We’re using the base pruned trafficcamnet model with RTSP inputs (20fps) and are setting the batch-size parameter in [streammux] and [primary-gie] groups and are seeing the following results:
Batch Size
# of streams
average FPS (0)
average FPS (1)
average FPS (2)
average FPS (3)
1
1
19.97
1
2
17.19
17.17
1
3
11.13
11.09
11.04
1
4
8.19
8.13
8.05
8.01
2
1
19.96
2
2
18.95
18.95
2
3
12.47
12.44
12.39
2
4
9.2
9.15
9.09
9.07
3
1
19.95
3
2
17.34
17.27
3
3
12.96
12.96
12.96
3
4
9.64
9.62
9.55
9.54
4
1
19.57
4
2
17.74
17.74
4
3
12.07
12.06
12.06
4
4
9.76
9.76
9.76
9.76
We are seeing some performance increase when matching the batch size to the number of input streams, but this increase is not very significant. This is running inside of a Docker container with all sinks set to fakesink.
My question is: is this marginal performance increase expected, or am I missing something and not taking full advantage of batching?
Yes, the batch size column in the table above represents the batch size set in both the streammux and the pgie groups.
Regarding testing the model using trtexec, I’m not seeing any option for tlt models (as I am using trafficcamnet) and per this post, trtexec doesn’t support tlt/etlt models.
Is there any other tools that we could use to benchmark the model outside of our application?
What is the expected performance increase when adjusting batch-size to match the number of input sources? Can you confirm whether or not the small performance increase we’re seeing is expected, or if there should be a greater difference?
We thought that increasing the batch-size (specifically to match the number of streams) would result in a greater performance increase than 1-2FPS. I do see that the performance (FPS) for each number of streams in my table above (1-4) is greatest when batch size == number of streams, but I’d expect batching to have a greater effect on performance.
However, if you think that 1-2FPS is the expected performance increase when changing batch-size then I can try to explore other ways to increase the multi-stream performance of our Deepstream app.
Do you have any recommendations to increase pipeline FPS outside of what’s explicitly mentioned in Deepstream best practices?