Degraded inference performance with stream_mux batch_size 16 or 32

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) GPU A4000 & T4
• DeepStream Version DS 6.1
• JetPack Version (valid for Jetson only)
• TensorRT Version
• NVIDIA GPU Driver Version (valid for GPU only)

• Issue Type( questions, new requirements, bugs) questions
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Below are our observation with DS 6.1 using nvinferserver plugin as grpc client. With object detection model we are having below observation.
Questions:

  1. If we are using stream_mux batch_size to 16 or 32, we see miss in detection in some source. if we use batch_size 8 then we do not see detection drop on some source. attached video for same.
  2. if we are using sync-input true in same use case we see some detection miss but if we do sync-input false then we do not see detection miss, with file based input we have have observed less frame in output file compare to sync-input true for same source.
2 Likes

Are you using deepstream-app for the testing?

We are using our own deep-stream app/gstreamer wrapper based on your project.

Source Batch Sync Avg # Frames Avg # Empty Frames Avg # Detections
8 8 Off 1441.75 1.625 2955793.00
8 8 On 1437.5 1.375 2946836.13
8 32 Off 1442 46.5 2846977.88
8 32 On 1437.5 1.375 2947558.88
16 8 Off 1442 1.25 2962775.31
16 8 On 726.25 1.1875 1487887.63
16 32 Off 1442 24 14349.13 Cam2/12 has low # detec, 13862
16 32 On 947.4375 61.8125 1821452.00 Cam1 has very low # detections, 621919

There are multiple issues here. In summary,

  • Turning sync on causes a frame drop (frame drop level changes based on other factors)
  • Increased number of sources causes higher level of frame drop when sync is on
  • Increased batch size and number of sources causes more empty frames i.e. less detection
  • As long as the number of sources is 8, it performs ok no matter batch size. (sync-inputs is still causing frame drop though)

It seems like there is a limit on number of source a pipeline can have no matter what GPU we use.
T4 and RTX A4000 are quite different in terms of performance yet we get the same result.

1 Like

We fixed the flickering on the video by turning off sync on filesink. It somehow affects the number of detections and causes flickering. Now, we can have output without flickering on 32 sources with 32 batch size. However, we still see a large number of empty frames (frames with no detection). What is the reason for that?

And why does ‘sync-inputs: 1’ causes frame drop and worse performance?

With RTSP source, I see detection misses on some objects with batch-size 8 and sync-input 0.

Are you using deepstream-app ?

Hello Laura,

we have created custom deepstream app in python taking reference from Deepstream-python-app runtime-add-delete source example.

You can use deepstream_python_apps/apps/deepstream-test3 at master · NVIDIA-AI-IOT/deepstream_python_apps (github.com) to run the multiple input case performance.

Hello Fiona,

I have ran the test with deepstream-test3 app with some changes to support visualisation of the output using RTSP. I see some detection drop of multiple sources.

attached video and app source code for same with performance number.


app-output.txt (16.6 KB)
iXhMsPqupMqye8Bdl1ZnM8aT.webm)
deepstream_test_3.py (20.3 KB)

Your command line?

python3 deepstream_test_3.py -i rtsp://192.168.168.21:8554/live/stream1 rtsp://192.168.168.21:8554/live/stream2 rtsp://192.168.168.21:8554/live/stream3 rtsp://192.168.168.21:8554/live/stream4 rtsp://192.168.168.21:8554/live/stream5 rtsp://192.168.168.21:8554/live/stream6 rtsp://192.168.168.21:8554/live/stream7 rtsp://192.168.168.21:8554/live/stream8 rtsp://192.168.168.21:8554/live/stream9 rtsp://192.168.168.21:8554/live/stream10 --pgie nvinferserver-grpc -c config_yolox_normal.txt --no-display --silent --disable-probe --rtsp-output

we have used our inference model for inferencing.

The grpc and RTSP output are all based on network which can not gurantee the app performance.

There is no update from you for a period, assuming this is not an issue anymore.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

“sync-inputs” will drop frames which reached later than the 1/FPS to sync all input sources according to the timestamp.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.