'tee-queue' bug under the nvstreammux situation

neoragex2002 · May 7, 2022, 6:38pm

• Hardware Platform (Jetson / GPU) GeForce RXT 2080ti
• DeepStream Version: 6.0
• TensorRT Version: 8.0.1.6+cuda11.3.1.005
• NVIDIA GPU Driver Version (valid for GPU only): 470.103.01
• Issue Type( questions, new requirements, bugs): bugs

Hi all,

I’m currently working on an simple camera image-capture project.

I used the ‘tee-queue’ pattern but found some things weird, especially under the deepstream situation.

Here is my first test. It’s a pure plain gstreamer job.

# TEST CASE 1:
$ gst-launch-1.0 \
    videotestsrc pattern=18 is-live=1 ! 'video/x-raw, width=1920, height=1080' ! \
    tee name=t \
    t. ! queue ! videoconvert ! nveglglessink sync=0 \
    t. ! queue max-size-buffers=0 max-size-bytes=0 max-size-time=0 ! videoconvert ! nveglglessink name=sink0 sync=0

I also wrote a probe function of gst-python to hook the ‘sink’ pad of sink0 in the 2nd tee-branch. As you can see the probe function was very simple. It just slept for random msec, for simulating some costs during the capture.

import random

def capture_probe_func(pad, info):
    delay = random.randint(0,2)
    GLib.usleep(delay*200000) # 200ms x delay, to simulate some processing costs.
    return Gst.PadProbeReturn.OK

The result seems as good as expected. Notice that the random delays of the 2nd tee-branch will never block the running of the 1st tee-branch, because they are separated by different threads (buffered queues). The display of the 1st tee-branch is very smooth.

Now there is my 2nd test, with deepstream. Things BAD happened.

# TEST CASE 2:
$ gst-launch-1.0 \
    videotestsrc pattern=18 is-live=1 ! 'video/x-raw, width=1920, height=1080, framerate=25/1' ! nvvideoconvert ! 'video/x-raw(memory:NVMM)' ! queue ! m.sink_0 \
    videotestsrc pattern=18 is-live=1 ! 'video/x-raw, width=1920, height=1080, framerate=25/1' ! nvvideoconvert ! 'video/x-raw(memory:NVMM)' ! queue ! m.sink_1 \
    videotestsrc pattern=18 is-live=1 ! 'video/x-raw, width=1920, height=1080, framerate=25/1' ! nvvideoconvert ! 'video/x-raw(memory:NVMM)' ! queue ! m.sink_2 \
    videotestsrc pattern=18 is-live=1 ! 'video/x-raw, width=1920, height=1080, framerate=25/1' ! nvvideoconvert ! 'video/x-raw(memory:NVMM)' ! queue ! m.sink_3 \
    nvstreammux name=m batch-size=4 width=1920 height=1080 live-source=1 batched-push-timeout=40000 sync-inputs=0 attach-sys-ts=0 nvbuf-memory-type=0 ! \
    tee name=t \
    t. ! queue ! nvmultistreamtiler rows=2 columns=2 ! nveglglessink sync=0 \
    t. ! queue max-size-buffers=0 max-size-bytes=0 max-size-time=0 ! fakesink name=sink0 sync=0

I used the same probe above to hook the ‘sink’ pad of sink0 in the 2nd tee-branch.

But at this time, the display of 1st branch became frequently choked. It seems the performance of the 1st branch has been affected by the random sleeps of the 2nd tee-branch seriously. The queue in 2nd tee-branch was totally USELESS to decouple the running of the two branches.

WHAT did this happen? All I want is:

An ‘independent enough’ behavior of the 2nd tee-branch that does not block the 1st branch under the nvstreammux situation, which means to support the buffering of batched-frames in ‘queue’ element.
I also want my probe func could capture EVERY frame AT ITS OWN PACE (assume that I can afford a lot of queue buffers in 2nd branch), instead of DISCARDING frames to catch up the pace of 1st branch by using `queue leaky=2 max-size-buffers=1’ in the 2nd tee-branch.
Behavior consistence between plain gstreamer & deepstream for the tee-queue pattern.

Can anyone kindly enough to explain this for me?

Thanks in advance,

Ng.

kesong · May 9, 2022, 1:07pm

nvstreammux will use buffer pool to reuse video buffer. So it will stuck if any downstream plugin hold the video buffer.

neoragex2002 · May 10, 2022, 7:24am

Thanks for your kindly reply!

I understand the resources of gpu memories are precious, and the design decision of refcount and pool blocking may be reasonable. But in many scenarios we may need an ‘async-message-queue’ alike behavior that does not lose frame (for example, the high-fidelity frames capture), while at the same time to keep the display rendering smooth.

What I suggest here is an mechanism that allow us to convert the gpu frame into cpu frame (which could be allowed to be queued in more cheaper memory), while keeping the inference metadata in the converted frames, for the convenience of latter analysis. Just for your consideration: )

system · May 24, 2022, 7:24am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
"Internal data stream error" with tee and nvstreammux DeepStream SDK	5	357	January 17, 2024
How to set the order of tee and nvstreammux and link them to duplicate a video file, before nvinfer in deepstream pipeline DeepStream SDK	7	1462	August 3, 2021
Pipeline stops with tee and queue DeepStream SDK gstreamer , deepstream	12	673	September 12, 2022
Tee + queue after nvinferserver causes random segmentation faults DeepStream SDK	2	258	February 2, 2023
Visualization bug when using preprocessing and metamux DeepStream SDK deepstream	3	63	August 30, 2024
Parallel branching in deepstream 6.4 DeepStream SDK	11	468	May 21, 2024
Starvation (?) of gstreamer threads DeepStream SDK gstreamer	11	2887	October 12, 2021
Deep stream cpp increasing latency when using 2 cameras DeepStream SDK	2	315	February 28, 2024
Using a Gstreamer Tee element in inference pipeline DeepStream SDK	9	4452	October 12, 2021
Put delay after streammux DeepStream SDK deepstream	14	49	March 21, 2025

'tee-queue' bug under the nvstreammux situation

Related topics