DeepStream Parallel Pipeline and Frame Synchronization

Setup:
• Hardware Platform: Jetson
• DeepStream Version: 7.0
• JetPack Version: 6.0
• TensorRT Version: 8.6.2.3
• Issue Type: Question

Hello everyone,

I’m working on a DeepStream single stream application where I’m using one tee and two separate streammux instances to render the outputs of two segmentation models inferring on the same frame by using nvsegvisual. However, I’m encountering some issues:

  1. Issue with nvsegvisual
    • Why doesn’t it work when using a single streammux before the tee? In this case, the segmentations from only one of the models appear in both windows of the multi-stream tiler.
    • Why does it work correctly when using two separate streammux instances after the tee?

Architecutre of Non-Working pipeline:

Architecutre of Working pipeline:

I have a hypothesis on why using two separate streammux instances works, but I’m not sure: could it be because each streammux generates separate and independent metadata? Or is there a copy of the buffer being created under the hood?

Is this the correct approach for a parallel pipeline, or is there a more lightweight and elegant solution?

Note: I had to split the pipeline into two separate branches because nvsegvisual was displaying the output of only one segmentation model, even though the metadata contained results from both models. This happened both in the case of a pipeline with sequential models and in the case of a parallel pipeline with a single streammux before the tee.

I read that a parallel pipeline does not provide any speed advantage over a sequential one, as the two instances of nvinfer are still executed asynchronously. However, as mentioned earlier, nvsegvisual only displays the output of one of the two models.

  1. Issue with mask overlay on the video
    I know that nvsegvisual only renders the mask and does not handle overlaying it onto the video. Is there an optimized way to achieve this without using probe functions with NumPy/OpenCV? Keep in mind that I’m running this on a Jetson device, so I can’t use CuPy to access the buffer, as support is only available for x86 architecture.

  2. Issue with pipeline branch synchronization
    What is the correct way to synchronize the two branches of the pipeline? As mentioned earlier, I need to obtain the results from both models on the same frame for further processing. However, sometimes Model 0 is faster than Model 1, and other times the reverse happens. How can I ensure proper synchronization?

Thanks in advance for your help!

1 Like
  1. in no-working pipeline, in the first tee branch, nvsegvisual will draw the frame. in the second tee branch, nvinfer will receive the drawn frame. this is not expected. in the working pipeline, nvstreammux will create a new frame, so the second tee branch will receive a clean frame.
  2. about " mask overlay on the video", please refer to the follow cmd:
gst-launch-1.0 filesrc  location=/opt/nvidia/deepstream/deepstream/samples/streams/sample_720p_mjpeg.mp4  ! jpegparse ! nvv4l2decoder mjpeg=1 ! mux.sink_0 nvstreammux name=mux batch-size=1 width=1280 height=720 ! nvinfer config-file-path=dstest_segmentation_config_semantic.txt ! nvsegvisual width=512 height=512 original-background=true alpha=0.5  ! nv3dsink
  1. you can set the max-same-source-frames, max-num-frames-per-batch and batched-push-timeout of new-streammux for making two different frames to one batch. please refer to faq.
1 Like

Hi @fanzh,

Thank you very much for the clarifications and the quick responses.

I’ve recently entered the awesome world of DeepStream, and there are still some aspects that are not entirely clear to me, particularly regarding buffer flow.

Would the working pipeline behavior be described as follows?

  1. filesrc → h264parse → decoder → tee → q1, q2

    • Up to this point, a single GstBuffer (let’s call it buf0) has been created in memory, containing both frames and metadata.
    • The stream is split, but q1 and q2 both have pointers referencing the same buffer.
  2. nvstreammux0 → nvinfer0 → nvsegvisual0

    • When adding nvstreammux0, a memory copy of GstBuffer is created (let’s call it buf1) and processed in branch0 of the pipeline.
  3. nvstreammux1 → nvinfer1 → nvsegvisual1

    • Similarly, adding nvstreammux1 creates another memory copy of GstBuffer (let’s call it buf2) processed in branch1.
  4. nvstreammux2 → tiler → osd → logger → sink

    • Since another nvstreammux is present, additional copies of the buffers are created, which we’ll call buf3 and buf4.

Is this understanding correct?

If so, I’m concerned that this implementation could be resource-intensive in terms of memory usage and latency, as multiple buffer copies (buf0, buf1, buf2, buf3, buf4) are created along the pipeline. Is it possible to achieve the same result as my working pipeline but with a more optimal design choice?

If my understanding is incorrect, could you please clarify how buffer references vs actual copies are handled in this context?
I am particularly interested in optimizing both efficiency and correctness when designing with this SDK.

Side note: Sometimes the documentation may be outdated or misleading. For example, regarding nvsegvisual, it states:
“This plugin shows only the segmentation output. It does not overlay output on the original NV12 frame.”
(source)

However, the caps between the src-sink of nvinfer and nvsegvisual show string(NV12). Am I misunderstanding something here?

Thanks in advance for your support!

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.