Nvinfer batch-size from video file input

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) Jetson
• DeepStream Version 6.2-triton docker container
• JetPack Version (valid for Jetson only) 5.1.1
• TensorRT Version 8.5.2
• NVIDIA GPU Driver Version (valid for GPU only)

Hello Everyone, I’ve read a lot of topics in this forum and tried everything however couldn’t reach the desired results. Sorry for bringing up this topic again. I’m building the engine with dynamic shapes: “min: 1x3x1088x1920 opt: 4x3x1088x1920 Max: 8x3x1088x1920”. When I set batch-size=8 in config_primary.txt and in the nvstreammux, inference takes 1m17 secs, if I choose batch-size=1 inference takes 34 secs. What am I doing wrong? Also when I choose a batch size of more than 1, nvosd draws more than 1 bboxes to an instance so something is going wrong definitely.
Pipeline:

Batch-size=1 stdout:

Batch-size=8 stdout:

Batch-size=8 nvosd output:
Screenshot 2024-02-15 at 09.52.48

EDIT 1: nvosd output solved by adding nvstreamdemux to the pipeline:
Screenshot 2024-02-15 at 16.09.51

Bonus question: What is the best way to measure the performance of a deepstream application?

https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_FAQ.html#what-is-the-difference-between-batch-size-of-nvstreammux-and-nvinfer-what-are-the-recommended-values-for-nvstreammux-batch-size

https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_Performance.html#configuration-file-settings-for-performance-measurement
https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_Performance.html#jetson

I’m doing the same as written here but it slows down instead of speeds up.

I’m asking how to “measure” the performance of a deepstream application.

Do you actually input 8 sources while you set batch size to 8 for nvinfer and nvstreammux? Have you set the sink to fakesink? Have you disabled nvosd and nvmultistreamtiler?
Even with the data you give to us, the batch size 1 pipeline can only handle one video, it takes 34 seconds. 8 videos take 34s x 8 = 272s = 4m32s, after you set batch size to 8, the pipeline can handle 8 videos simultaneously, it takes 1m17s which is much less than 4m32s.
The batch saved your time.

If you are using deepstream-app sample application, it can output the pipeline FPS.
If you are using your own DeepStream application, you may refer to
DeepStream SDK FAQ - Intelligent Video Analytics / DeepStream SDK - NVIDIA Developer Forums

To measure the pipeline performance, you need to follow the Performance — DeepStream documentation 6.4 documentation (nvidia.com) to disable nvosd and nvmultistreamtiler, and the fakesink with “sync=0” will help you to disable the sink clock synchronization which may affect the pipeline.

It is for the same 1 video, if I set the batch-size=1, inference takes 34 seconds. If I set batch-size=8 it takes 1min 17 seconds.

I use 1 input video. In the documentation, “We recommend that the nvstreammux’s batch-size be set to either number of sources linked to it or the primary nvinfer’s batch-size.” was written so I set it to primary nvinfer’s batch-size.

I set batch-size=2 in both nvstreammux and nvinfer. If I put 2 input sources throughput is 40 FPS however if I put 1 input source throughput is 20 FPS. If I don’t have 32 input sources, can I not use 32 batch size?

I think if there is only 1 source and batch-size is more than 1, it fills other frames in the batch with dummy data. Am I correct? I want to make the batch filled from 1 source without dummy frames.

Yes

Yes

Batch size 8 is for 8 videos but not for one video. It is a waste to set batch size 8 with one video.

The batch size(both for nvinfer and nvstreammux) should be aligned with source number if you want to get the best performance.

Hi Fiona, I’ve found something in the nvstreammux new documentation. There is a key max-same-source-frames. The default value is 1 and probably it is the same in the current nvstreammux. If use the new nvstreammux and set the batch size of it to 8, would it batch 8 frames from a single video?

It will try to batch the 8 frames in the same video, but it does not guarantee it since there is also other limitations such as “overall-max-fps-n”, “overall-max-fps-d”, “overall-min-fps-n” , “overall-min-fps-d” , “adaptive-batching”, “max-fps-n”, “max-fps-d”, …

Do you mean there is no such configuration that guarantees that?

I mean you need to configure all the parameters with proper values to guarantee what you want according to your actual sources.

And the cost is delay for such situation.

In my case, the inference speed is 60 fps and the decoding speed is 200 fps therefore for the batch size 32, inference takes around 500 ms and batching should take around 150 ms. If you are decoding frames from a video, it doesn’t matter if you have 32 sources or 1 source, you will have the same decoding throughput. Apparently, developers are trying to make this possible with the new streammux, however, I don’t understand why it wasn’t a default option, to be honest.

For the live stream, you can never get the succeeded frames at the same time, you can only get them one by one according to the framerate or the time stamps. It is a must to wait for the frames to combine a batch. It has nothing to do with the decoding or inferencing speed. To combine frames from different streams are more efficient than frames from a single stream. The multiple streams are the main scenario for DeepStream.

1 Like

I understand, now it is clear. Thank you for the explanation.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.