No increase using tee and parallel inference on AGX

moaaz.rahman2 · January 14, 2024, 10:50am

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson AGX )
• DeepStream Version: 6.3
• JetPack Version : 5.1
• TensorRT Version : 8.5.2
• Issue Type( questions)

I have built pipeline using tee element to run 3 yolov6 models in parallel on jetson AGX. I was expecting decrease in inference time, however I am getting the same time as if they are running sequential. My question, Is it expected to gain more fps using tee element and run models in parallel ?

fanzh · January 14, 2024, 2:41pm

what is the whole media pipeline? please refer to the parallel inference sample deepstream_parallel_inference_app.

moaaz.rahman2 · January 14, 2024, 4:18pm

Thanks for your reply. Kindly find the pipeline below.

fanzh · January 18, 2024, 7:48am

Sorry for the late reply, Is this still an DeepStream issue to support?
in theory, parallel inference( tee + queue) is faster than sequential inference. how did you measure the fps?
why do you need to run the same model in parallel?

moaaz.rahman2 · January 21, 2024, 1:52pm

These are not the same models. I have 3 models and need to build a pipeline to run them all. As they are not dependent on each other I wanted to run them in parallel.

I am using GstNvDslogger element to measure the fps.

fanzh · January 22, 2024, 3:08am

here are some reasons. there is a buffer pool in streammux. streammux making batch is fast while inference is slow. streammux will always wait until the buffer returns the pool. if using tee, the batch data is not copied to queue. it is a just a reference. after the the models finishes inference. the buffer will return streammux’s buffer pool. so the consumption is similar to sequential inference. in sample deepstream_parallel_inference_app above. the pipeline is designed to “streammux +tee +streamdemux + streammux”, the first streammux will not wait because the buffer will return after the second streammux.
deepstream_parallel_inference_app will merge the meta from different branches. if no need to merge meta. why not use three gst-launch command-lines and each command-line uses one model in your application?

moaaz.rahman2 · January 22, 2024, 10:47am

Ok, I was wondering why “streammux +tee +streamdemux + streammux” is used in your pipeline. I thought it is just to select one stream for each branch. I will add these elements to the pipeline and benchmark the performance.

For the second point about merging the meta. I am implementing the pipeline in python code because I want to integrate it with other application. I am going to use probes to get the metadata. Do you think merging the meta will increase the FPS ?

fanzh · January 22, 2024, 2:36pm

The first function is selecting source for each branch. the second function is for parallel inference. every branch does not use shared batch data because the second streammux will create new buffer for each branch.

no.

moaaz.rahman2 · January 23, 2024, 3:18pm

Hello @fanzh ,

Thank you for your support. I have built the following pipeline following your advice by adding ““streammux +tee +streamdemux + streammux”, however I am getting 0 fps in the logger. I do not know what is wrong. Could you please check the pipeline.

fanzh · January 24, 2024, 9:27am

can you narrow down this issue? from example,

add printing in probe function to check which element did not output data.
if using tee with one branch, can the app run well?
you might dump deepstream_parallel_inference_app’s pipeline to do some comparisons.

moaaz.rahman2 · January 24, 2024, 1:32pm

The pipeline runs when I create with only one branch as following:

Streamdemux is causing the issue. when I remove it from the second branch. it runs with no issues as following.

Finally, I have tried to keep streamdemux in branch 1 and remove it from branch 2 and 3 and it worked, however no effect on fps. FPS still the same.

Is streamdemux effective element to build correct parallel pipeline? Or it has no effect as long as I am adding 2 streammux ?

fanzh · January 25, 2024, 2:42am

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

if no need to merge metadata, please refer to the following pipeline. it works well.

gst-launch-1.0 filesrc location = /opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.mp4 ! qtdemux ! h264parse ! nvv4l2decoder ! m.sink_0 nvstreammux name=m batch-size=2 width=1280 height=1280 ! queue ! nvstreamdemux name=demux0 demux0.src_0 ! tee name=srctee0 srctee0. ! queue ! m0.sink_0 nvstreammux name=m0 batch-size=1 width=1280 height=720 ! fakesink srctee0. ! queue ! m1.sink_0 nvstreammux name=m1 batch-size=1 width=1280 height=720 ! fakesink

the simplified pipeline is

......! nvv4l2decoder ! streammux ! streamdemux ! tee ! streammux ! fakesink
                                                      ! streammux ! fakesink

system · February 24, 2024, 5:49am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Parallel Inference vs Tee DeepStream SDK	8	266	August 26, 2024
Issues when using tee with output-selector DeepStream SDK gstreamer , deepstream	20	287	July 31, 2025
Parallel Inference Demo using Python DeepStream SDK	4	942	March 30, 2023
Does DeepStream support running multiple classifiers in parallel after a detector? DeepStream SDK	9	1035	May 10, 2022
Problem with overlapping tee-related inference results in deepstream DeepStream SDK	3	73	August 9, 2024
Nvinfer multi-threading DeepStream SDK deepstream	14	1876	April 5, 2022
Issue with parallel branches in DeepStream pipeline DeepStream SDK deepstream	14	235	September 2, 2025
Image inference dual model pipeline encountered duplicate objectmeta content DeepStream SDK gstreamer , deepstream	13	114	September 17, 2025
Parallel branching in deepstream 6.4 DeepStream SDK	11	666	May 21, 2024
DeepStream Parallel Inferencing pipeline DeepStream SDK jetson-inference	4	549	August 25, 2023

No increase using tee and parallel inference on AGX

Related topics