Adding sources halves fps

• Hardware Platform (Jetson / GPU): GPU
• DeepStream Version: 6.1
• JetPack Version (valid for Jetson only)
• TensorRT Version: 8.2.5-1+cuda11.4
• NVIDIA GPU Driver Version (valid for GPU only): 510.73.05
• Issue Type( questions, new requirements, bugs): Question/bug
I’m currently using the example from the Deepstream python apps called

I have modified the script to use a custom YoloR following the instructions in this other repository: GitHub - marcoslucianops/DeepStream-Yolo: NVIDIA DeepStream SDK 6.1 / 6.0.1 / 6.0 configuration for YOLO models

I have also added code from other examples to get the number of fps and send it to the osd.

The issue I’m having is that the inference speed halves when adding a new source. I have several rtsp streams in h264 running at 15 fps, when I run the pipeline with just 1 stream, it runs at 15 fps and the GPU sits at 10-12% utilisation. When I add a second stream, inference speed and GPU utilisation remains the same overall, essentially halving the number of fps for each of the streams. This happens again when doubling the amount of streams. It sits at 15fps and 10-12% utilisation

1 how do you get the number of fps?
2 When I add a second stream, do you mean the output fps should be 30fps?

Hello, to answer your questions:

  1. I’m getting fps with the PERF_DATA class in the common/ script So it gives me a number for each stream individually.
  2. I mean the output should be 2 15 fps streams, instead I was getting 2 7.5 fps streams.

I managed to somehow fix this over the weekend, although I’m not sure I understand the fix, I set the streammuxer batch size to the number of sources and that did the trick, I don’t really understand why the streammuxer would not provide more frames if there’s capability for them to be processed

yes, you can moidfy batch-size to improve performance. The streammuxer forms a batched buffer of batch-size frames. if batchsize = 1, streammuxer will push one frame every time, it will increase the number of interactions between CPU and GPU, and did not leverage GPU 's Parallel advantage.

