Please provide complete information as applicable to your setup.
• Hardware Platform (Jetson / GPU) Jetson • DeepStream Version 6.2-triton docker container • JetPack Version (valid for Jetson only) 5.1.1 • TensorRT Version 8.5.2 • NVIDIA GPU Driver Version (valid for GPU only)
Hello Everyone, I’ve read a lot of topics in this forum and tried everything however couldn’t reach the desired results. Sorry for bringing up this topic again. I’m building the engine with dynamic shapes: “min: 1x3x1088x1920 opt: 4x3x1088x1920 Max: 8x3x1088x1920”. When I set batch-size=8 in config_primary.txt and in the nvstreammux, inference takes 1m17 secs, if I choose batch-size=1 inference takes 34 secs. What am I doing wrong? Also when I choose a batch size of more than 1, nvosd draws more than 1 bboxes to an instance so something is going wrong definitely.
Pipeline:
Do you actually input 8 sources while you set batch size to 8 for nvinfer and nvstreammux? Have you set the sink to fakesink? Have you disabled nvosd and nvmultistreamtiler?
Even with the data you give to us, the batch size 1 pipeline can only handle one video, it takes 34 seconds. 8 videos take 34s x 8 = 272s = 4m32s, after you set batch size to 8, the pipeline can handle 8 videos simultaneously, it takes 1m17s which is much less than 4m32s.
The batch saved your time.
To measure the pipeline performance, you need to follow the Performance — DeepStream documentation 6.4 documentation (nvidia.com) to disable nvosd and nvmultistreamtiler, and the fakesink with “sync=0” will help you to disable the sink clock synchronization which may affect the pipeline.
It is for the same 1 video, if I set the batch-size=1, inference takes 34 seconds. If I set batch-size=8 it takes 1min 17 seconds.
I use 1 input video. In the documentation, “We recommend that the nvstreammux’s batch-size be set to either number of sources linked to it or the primary nvinfer’s batch-size.” was written so I set it to primary nvinfer’s batch-size.
I set batch-size=2 in both nvstreammux and nvinfer. If I put 2 input sources throughput is 40 FPS however if I put 1 input source throughput is 20 FPS. If I don’t have 32 input sources, can I not use 32 batch size?
I think if there is only 1 source and batch-size is more than 1, it fills other frames in the batch with dummy data. Am I correct? I want to make the batch filled from 1 source without dummy frames.
Hi Fiona, I’ve found something in the nvstreammux new documentation. There is a key max-same-source-frames. The default value is 1 and probably it is the same in the current nvstreammux. If use the new nvstreammux and set the batch size of it to 8, would it batch 8 frames from a single video?
It will try to batch the 8 frames in the same video, but it does not guarantee it since there is also other limitations such as “overall-max-fps-n”, “overall-max-fps-d”, “overall-min-fps-n” , “overall-min-fps-d” , “adaptive-batching”, “max-fps-n”, “max-fps-d”, …
In my case, the inference speed is 60 fps and the decoding speed is 200 fps therefore for the batch size 32, inference takes around 500 ms and batching should take around 150 ms. If you are decoding frames from a video, it doesn’t matter if you have 32 sources or 1 source, you will have the same decoding throughput. Apparently, developers are trying to make this possible with the new streammux, however, I don’t understand why it wasn’t a default option, to be honest.
For the live stream, you can never get the succeeded frames at the same time, you can only get them one by one according to the framerate or the time stamps. It is a must to wait for the frames to combine a batch. It has nothing to do with the decoding or inferencing speed. To combine frames from different streams are more efficient than frames from a single stream. The multiple streams are the main scenario for DeepStream.