• Hardware Platform (Jetson / GPU) GPU
• DeepStream Version 7.0 (docker image: nvcr.io/nvidia/deepstream:7.0-triton-multiarch )
• NVIDIA GPU Driver Version (valid for GPU only) 535.171.04
Hi,
I’m trying to understand the differences between two pipeline structures in DeepStream. Specifically, I’m comparing a pipeline similar to the one used in the deepstream_parallel_inference_app:
...... ! streammux ! nvinfer ! nvtracker ! streamdemux ! tee ! streammux ! nvinfer ! fakesink
! streammux ! nvinfer ! fakesink
with a simpler version that uses only the tee
element:
...... ! streammux ! nvinfer ! nvtracker ! tee ! nvinfer ! fakesink
! nvinfer ! fakesink
I’ve already read through these discussions on the forum: No increase using tee and parallel inference on AGX and Parallel branching in DeepStream 6.4.
From what I understand, in the parallel inference case, the buffer is copied between different branches, allowing each branch to work on its own copy. In contrast, with the tee
approach, the buffer is shared across all branches.
However, I’m still unclear on the benefits of parallel inference.
What does it mean when we say that models run in parallel in context of parallel inference?
In which structure (parallel inference vs. tee
) can the models operate each at different frame rates depending on their respective inference speeds?
Any insights would be greatly appreciated!
Thanks.