I’m working on a DeepStream single stream application where I’m using one tee and two separate streammux instances to render the outputs of two segmentation models inferring on the same frame by using nvsegvisual. However, I’m encountering some issues:
Issue with nvsegvisual
Why doesn’t it work when using a single streammux before the tee? In this case, the segmentations from only one of the models appear in both windows of the multi-stream tiler.
Why does it work correctly when using two separate streammux instances after the tee?
I have a hypothesis on why using two separate streammux instances works, but I’m not sure: could it be because each streammux generates separate and independent metadata? Or is there a copy of the buffer being created under the hood?
Is this the correct approach for a parallel pipeline, or is there a more lightweight and elegant solution?
Note: I had to split the pipeline into two separate branches because nvsegvisual was displaying the output of only one segmentation model, even though the metadata contained results from both models. This happened both in the case of a pipeline with sequential models and in the case of a parallel pipeline with a single streammux before the tee.
Issue with mask overlay on the video
I know that nvsegvisual only renders the mask and does not handle overlaying it onto the video. Is there an optimized way to achieve this without using probe functions with NumPy/OpenCV? Keep in mind that I’m running this on a Jetson device, so I can’t use CuPy to access the buffer, as support is only available for x86 architecture.
Issue with pipeline branch synchronization
What is the correct way to synchronize the two branches of the pipeline? As mentioned earlier, I need to obtain the results from both models on the same frame for further processing. However, sometimes Model 0 is faster than Model 1, and other times the reverse happens. How can I ensure proper synchronization?
in no-working pipeline, in the first tee branch, nvsegvisual will draw the frame. in the second tee branch, nvinfer will receive the drawn frame. this is not expected. in the working pipeline, nvstreammux will create a new frame, so the second tee branch will receive a clean frame.
about " mask overlay on the video", please refer to the follow cmd:
you can set the max-same-source-frames, max-num-frames-per-batch and batched-push-timeout of new-streammux for making two different frames to one batch. please refer to faq.
Thank you very much for the clarifications and the quick responses.
I’ve recently entered the awesome world of DeepStream, and there are still some aspects that are not entirely clear to me, particularly regarding buffer flow.
Would the working pipeline behavior be described as follows?
filesrc → h264parse → decoder → tee → q1, q2
Up to this point, a single GstBuffer (let’s call it buf0) has been created in memory, containing both frames and metadata.
The stream is split, but q1 and q2 both have pointers referencing the same buffer.
nvstreammux0 → nvinfer0 → nvsegvisual0
When adding nvstreammux0, a memory copy of GstBuffer is created (let’s call it buf1) and processed in branch0 of the pipeline.
nvstreammux1 → nvinfer1 → nvsegvisual1
Similarly, adding nvstreammux1 creates another memory copy of GstBuffer (let’s call it buf2) processed in branch1.
nvstreammux2 → tiler → osd → logger → sink
Since another nvstreammux is present, additional copies of the buffers are created, which we’ll call buf3 and buf4.
Is this understanding correct?
If so, I’m concerned that this implementation could be resource-intensive in terms of memory usage and latency, as multiple buffer copies (buf0, buf1, buf2, buf3, buf4) are created along the pipeline. Is it possible to achieve the same result as my working pipeline but with a more optimal design choice?
If my understanding is incorrect, could you please clarify how buffer references vs actual copies are handled in this context?
I am particularly interested in optimizing both efficiency and correctness when designing with this SDK.
Side note: Sometimes the documentation may be outdated or misleading. For example, regarding nvsegvisual, it states: “This plugin shows only the segmentation output. It does not overlay output on the original NV12 frame.”
(source)
However, the caps between the src-sink of nvinfer and nvsegvisual show string(NV12). Am I misunderstanding something here?