I have built pipeline using tee element to run 3 yolov6 models in parallel on jetson AGX. I was expecting decrease in inference time, however I am getting the same time as if they are running sequential. My question, Is it expected to gain more fps using tee element and run models in parallel ?
Sorry for the late reply, Is this still an DeepStream issue to support?
in theory, parallel inference( tee + queue) is faster than sequential inference. how did you measure the fps?
why do you need to run the same model in parallel?
These are not the same models. I have 3 models and need to build a pipeline to run them all. As they are not dependent on each other I wanted to run them in parallel.
I am using GstNvDslogger element to measure the fps.
here are some reasons. there is a buffer pool in streammux. streammux making batch is fast while inference is slow. streammux will always wait until the buffer returns the pool. if using tee, the batch data is not copied to queue. it is a just a reference. after the the models finishes inference. the buffer will return streammux’s buffer pool. so the consumption is similar to sequential inference. in sample deepstream_parallel_inference_app above. the pipeline is designed to “streammux +tee +streamdemux + streammux”, the first streammux will not wait because the buffer will return after the second streammux.
deepstream_parallel_inference_app will merge the meta from different branches. if no need to merge meta. why not use three gst-launch command-lines and each command-line uses one model in your application?
Ok, I was wondering why “streammux +tee +streamdemux + streammux” is used in your pipeline. I thought it is just to select one stream for each branch. I will add these elements to the pipeline and benchmark the performance.
For the second point about merging the meta. I am implementing the pipeline in python code because I want to integrate it with other application. I am going to use probes to get the metadata. Do you think merging the meta will increase the FPS ?
The first function is selecting source for each branch. the second function is for parallel inference. every branch does not use shared batch data because the second streammux will create new buffer for each branch.
Thank you for your support. I have built the following pipeline following your advice by adding ““streammux +tee +streamdemux + streammux”, however I am getting 0 fps in the logger. I do not know what is wrong. Could you please check the pipeline.
There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks
if no need to merge metadata, please refer to the following pipeline. it works well.