Please provide complete information as applicable to your setup.
• Hardware Platform (Jetson / GPU) GPU
• DeepStream Version 6.4
• TensorRT Version8.6.1.6
• NVIDIA GPU Driver Version (valid for GPU only) 535.171.04
Issue Type( questions, new requirements, bugs) Question
I’m working on optimizing our DeepStream pipeline to process long videos more quickly using multiple GPUs. Our current setup is as follows:
- The current pipeline is:
uridecodebin -> streammuxer -> pgie -> tracker -> osd -> fakesink
- This pipeline currently runs on a single GPU.
Our goal is to reduce the processing time by a factor of 4 by utilizing 4 GPUs multi gpu machines. We’re specifically looking at ways to distribute the processing of a single long video across these 4 GPUs.
Currently I am considering chunking the video and distributing chunks to different GPUs.
What are the best practices or recommended approaches for efficiently distributing the processing of a single long video across multiple GPUs in DeepStream? Are there any existing DeepStream plugins that could help with this task?
Any insights, examples, or suggestions would be appreciated!