Adding sources halves fps

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU): GPU
• DeepStream Version: 6.1
• JetPack Version (valid for Jetson only)
• TensorRT Version: 8.2.5-1+cuda11.4
• NVIDIA GPU Driver Version (valid for GPU only): 510.73.05
• Issue Type( questions, new requirements, bugs): Question/bug
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing):

I’m currently using the example from the Deepstream python apps called deepstream_test1_rtsp_in_rtsp_out.py.

I have modified the script to use a custom YoloR following the instructions in this other repository: GitHub - marcoslucianops/DeepStream-Yolo: NVIDIA DeepStream SDK 6.1 / 6.0.1 / 6.0 configuration for YOLO models

I have also added code from other examples to get the number of fps and send it to the osd.

The issue I’m having is that the inference speed halves when adding a new source. I have several rtsp streams in h264 running at 15 fps, when I run the pipeline with just 1 stream, it runs at 15 fps and the GPU sits at 10-12% utilisation. When I add a second stream, inference speed and GPU utilisation remains the same overall, essentially halving the number of fps for each of the streams. This happens again when doubling the amount of streams. It sits at 15fps and 10-12% utilisation

• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

1 how do you get the number of fps?
2 When I add a second stream, do you mean the output fps should be 30fps?

Hello, to answer your questions:

  1. I’m getting fps with the PERF_DATA class in the common/fps.py script So it gives me a number for each stream individually.
  2. I mean the output should be 2 15 fps streams, instead I was getting 2 7.5 fps streams.

I managed to somehow fix this over the weekend, although I’m not sure I understand the fix, I set the streammuxer batch size to the number of sources and that did the trick, I don’t really understand why the streammuxer would not provide more frames if there’s capability for them to be processed

yes, you can moidfy batch-size to improve performance. The streammuxer forms a batched buffer of batch-size frames. if batchsize = 1, streammuxer will push one frame every time, it will increase the number of interactions between CPU and GPU, and did not leverage GPU 's Parallel advantage.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.