We are using Gstreamer, along with the ‘nvinfer’ NVIDIA deep stream GstElement for inference. The idea is to have one video stream, and be able to instantaneously and dynamically select which combination of inference engines we want to apply to the stream. For instance, we might want to have no inference occurring, or both a person and car detection model/engine running. Note that the inference elements are not configured to edit the video in any way - we simply need the metadata they produce.
The key requirement is that the switching occurs almost instantly. As a result we hope to have a static Gstreamer pipeline layout, with the ability to disable/enable certain elements in the pipeline if needed at runtime. When an inference element is disabled, it should not be using any resources.
So far we attempted to have a ‘tee’ element, branching off of our video source: with a separate inference engine on each branch. A valve in front of each inference element allows us to determine which inference engines we want to use at runtime.
Source ----- [Tee] ---------------------------- [Aggregator?] ----- Output
| |
|----- [valve] ----- [inference1] -----|
| |
|----- [valve] ----- [inference2] -----|
However we are struggling to then obtain all of the metadata for each inference engine back onto the main branch. The aggregators seem not to work, as they require all of their input branches to have video frames before they aggregate the data.
So our question is what the best way to achieve our goal? Is our ‘parallel’ approach viable, and if so, how do we merge the metadata in the different branches back together again. If not, what is an alternative way to achieve this?