Same config,same sources and same model engine file,the performance of deepstream-app and test3 aren't the same reult

Hi,

I try to feed 16 input sources using deepstream-app and test3 with the same config,same sources and same model engine file. My system configuration: ubuntu18.04 deepstream5.0 ,GTX1070. YOLOv3 Input image size is 320.
Deepstream-app as the following:

Comp name = nvv4l2decoder12 in_system_timestamp = 1589076866411.476074 out_system_timestamp = 1589076866513.801025 component latency= 102.324951
Comp name = src_bin_muxer source_id = 3 pad_index = 3 frame_num = 470 in_system_timestamp = 1589076866669.389893 out_system_timestamp = 1589076866677.427979 component_latency = 8.038086
Comp name = nvv4l2decoder3 in_system_timestamp = 1589076866406.945068 out_system_timestamp = 1589076866513.219971 component latency= 106.274902
Comp name = src_bin_muxer source_id = 5 pad_index = 5 frame_num = 470 in_system_timestamp = 1589076866664.779053 out_system_timestamp = 1589076866677.427979 component_latency = 12.648926
Comp name = primary_gie in_system_timestamp = 1589076866677.454102 out_system_timestamp = 1589076866734.322998 component latency= 56.868896
Comp name = tracking_tracker in_system_timestamp = 1589076866734.341064 out_system_timestamp = 1589076866750.394043 component latency= 16.052979
Comp name = tiled_display_tiler in_system_timestamp = 1589076866756.521973 out_system_timestamp = 1589076866769.393066 component latency= 12.871094
Comp name = nvosd0 in_system_timestamp = 1589076866770.602051 out_system_timestamp = 1589076866772.271973 component latency= 1.669922

Test3 as the following:

Comp name = nvv4l2decoder12 in_system_timestamp = 1589076914243.735107 out_system_timestamp = 1589076914513.468018 component latency= 269.732910
Comp name = stream-muxer in_system_timestamp = 1589076914513.476074 out_system_timestamp = 1589076914594.230957 component latency= 80.754883
Comp name = nvv4l2decoder13 in_system_timestamp = 1589076914161.397949 out_system_timestamp = 1589076914505.507080 component latency= 344.109131
Comp name = stream-muxer in_system_timestamp = 1589076914505.520020 out_system_timestamp = 1589076914594.230957 component latency= 88.710938
Comp name = nvv4l2decoder1 in_system_timestamp = 1589076914306.117920 out_system_timestamp = 1589076914513.448975 component latency= 207.331055
Comp name = stream-muxer in_system_timestamp = 1589076914513.465088 out_system_timestamp = 1589076914594.230957 component latency= 80.765869
Comp name = nvv4l2decoder3 in_system_timestamp = 1589076914233.283936 out_system_timestamp = 1589076914517.877930 component latency= 284.593994
Comp name = stream-muxer in_system_timestamp = 1589076914517.884033 out_system_timestamp = 1589076914594.230957 component latency= 76.346924
Comp name = primary-nvinference-engine in_system_timestamp = 1589076914594.248047 out_system_timestamp = 1589076914819.144043 component latency= 224.895996
Comp name = tracker in_system_timestamp = 1589076914819.154053 out_system_timestamp = 1589076914835.211914 component latency= 16.057861
Comp name = nvtiler in_system_timestamp = 1589076914840.446045 out_system_timestamp = 1589076914850.270996 component latency= 9.824951
Comp name = nv-onscreendisplay in_system_timestamp = 1589076914850.374023 out_system_timestamp = 1589076914888.256104 component latency= 37.882080

I found the different performance of YOLOV3 as primary-nvinference-engine when using deepstream-app and test3, deepsteam-app is about 50-60ms ,but test3 is above 200ms.To the point , the same cfg file, the same model engine file and the same sources.
Can you help me to investigate it. Thanks a lot.

We have added queues after every element in the test3 app which improved the fps. The following is the code snippet in python, which shows creating queues, adding them to the pipeline and linking them after each and every element. The same applies to the C version of this app which will be available in an upcoming release.

queue1=Gst.ElementFactory.make("queue","queue1")
queue2=Gst.ElementFactory.make("queue","queue2")
queue3=Gst.ElementFactory.make("queue","queue3")
queue4=Gst.ElementFactory.make("queue","queue4")
queue5=Gst.ElementFactory.make("queue","queue5")
pipeline.add(queue1)
pipeline.add(queue2)
pipeline.add(queue3)
pipeline.add(queue4)
pipeline.add(queue5)
print("Creating Pgie \n ")

streammux.link(queue1)
queue1.link(pgie)
pgie.link(queue2)
queue2.link(tiler)
tiler.link(queue3)
queue3.link(nvvidconv)
nvvidconv.link(queue4)
queue4.link(nvosd)
if is_aarch64():
    nvosd.link(queue5)
    queue5.link(transform)
    transform.link(sink)
else:
    nvosd.link(queue5)
    queue5.link(sink)