Unexpected FPS drop with back-to-back detector concept in deepstream-app

• Hardware Platform Nvidia Tesla T4
• DeepStream Version 5.1

• TensorRT Version 7.2.2.3
• NVIDIA GPU Driver Version (valid for GPU only) 460.32…03
• Issue Type( questions, new requirements, bugs) question

HI,
I am using back-to-back detector with 4 video streams.

These are the FPS observations-
1) Yolo_tinyv4 based object detector as primary detector only (not any secondary gie)-
perf-

**PERF: 155.15 (154.35) 155.15 (154.35) 155.15 (154.35) 155.15 (154.35)
**PERF: 158.66 (156.92) 158.66 (156.92) 158.66 (156.92) 158.66 (156.92)
**PERF: 157.15 (156.91) 157.15 (156.91) 157.15 (156.91) 157.15 (156.91)
**PERF: 157.98 (157.19) 157.98 (157.19) 157.98 (157.19) 157.98 (157.19)
**PERF: 165.82 (159.00) 165.82 (159.00) 165.82 (159.00) 165.82 (159.00)

2) Centerface as primary detector (not any secondary gie)-
performance-

**PERF: 190.04 (189.98) 190.04 (189.98) 190.04 (189.98) 190.04 (189.98)
**PERF: 205.32 (198.29) 205.32 (198.29) 205.32 (198.29) 205.32 (198.29)
**PERF: 205.48 (201.02) 205.48 (201.02) 205.48 (201.02) 205.48 (201.02)
**PERF: 206.76 (202.28) 206.76 (202.28) 206.76 (202.28) 206.76 (202.28)
**PERF: 203.99 (202.84) 203.99 (202.84) 203.99 (202.84) 203.99 (202.84)

3) Yolo_tiny as primary detector and centerface as secondary gie.-
performance-

**PERF: 27.02 (26.32) 26.40 (25.76) 26.40 (25.76) 26.40 (25.76)
**PERF: 29.18 (27.90) 29.18 (27.61) 29.18 (27.61) 29.18 (27.61)
**PERF: 29.30 (28.68) 29.30 (28.48) 29.30 (28.48) 29.30 (28.48)
**PERF: 33.12 (29.85) 33.12 (29.68) 33.12 (29.68) 33.12 (29.68)

Detections are correct but not sure why FPS dropped significantly.

here is primary gie and secondary gie groups-

[primary-gie]
enable=1
gpu-id=0
gie-unique-id=1
nvbuf-memory-type=0
config-file=config_infer_primary.txt

[secondary-gie]
enable=1
gpu-id=0
model-engine-file=model/centerface.onnx_b4_gpu0_fp16.engine
batch-size=4
interval=0
gie-unique-id=2
nvbuf-memory-type=0
config-file=config_infer_primary_centerface.txt
operate-on-gie-id=1

Anything missing in configuration or its expected in back-to-back detector?
Also i was thinking secondary detector runs on primary gie detections not in full frame. Is that the reason for FPS drop?
Thanks.

Can you upload the nvinfer config files of the two models?

for yolo_tiny-
[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-color-format=0
custom-network-config=model/yolov4-tiny.cfg
model-file=model/yolov4-tiny.weights
model-engine-file=model/model_b4_gpu0_fp16.engine
labelfile-path=labels.txt
batch-size=4
network-mode=2
num-detected-classes=80
interval=0
gie-unique-id=1
#process-mode=1
network-type=0
cluster-mode=4
maintain-aspect-ratio=0
parse-bbox-func-name=NvDsInferParseYolo
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet

[class-attrs-all]
pre-cluster-threshold=0.25

for centerface-
[property]
gpu-id=0
#net-scale-factor=0.0039215697906911373
#net-scale-factor=1
#0=RGB, 1=BGR
model-color-format=0
onnx-file=model/centerface.onnx
batch-size=4
network-mode=2
num-detected-classes=1
gie-unique-id=2
network-type=0
#output_tensor_meta=0
cluster-mode=0
#maintain-aspect-ratio=1
parse-bbox-func-name=NvDsInferParseCustomYoloV4
custom-lib-path=nvdsinfer_custom_impl_centerface/libnvdsinfer_custom_impl_centerface.so
#scaling-filter=0
#scaling-compute-hw=0
#labelfile-path=labels_yolo.txt
labelfile-path=centerface_labels.txt
#process-mode=0

[class-attrs-all]
nms-iou-threshold=0.6
pre-cluster-threshold=0.4

This is not back-to-back. Back-to-back needs PGIE+SGIE. You use two PGIEs.

Are you using deepstream-app? What is the deepstream-app configuration?

Hi,
I don’t understand why this is not PGIE+SGIE.
deepstream-app_config.txt contains PGIE and SGIE.
And there are 2 different nvinfer configs for PGIE and SGIE respectively.

NOTE-
This is not back-to-back detector but deepstream-app with PGIE+SGIE. It is back-to-back detector concept. Sorry if my title confused you. Will update the title.

Please find the attached config files.
config-files.zip (2.9 KB)

I’ve checked the config files, they are all PGIEs. Only “process-mode=2” indicates a SGIE, there is none in your configurations.
https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_plugin_gst-nvinfer.html#id2

ok,
So what should i do to make deepstream-app work as PGIE+SGIE? If “process-mode” is the only criteria so i have already done the followings-

I have putted “process-mode=2” in [secondary-gie] in deepstream_app_config.txt.
It showed “process-mode” not recognised.

I have checked both “process-mode=1” and “process-model=2” in [property] in config_infer_primary_centerface.txt.
Then same FPS output is there.

So i am not sure what should i configure and where?

Please config “process-model=2” in config_infer_primary_centerface.txt.

In PGIE + SGIE mode, SGIE rely on the output from PGIE. FPS is decided by the whole performance of the pipeline but not simple PGIE speed and SGIE speed.

If you want to find bottleneck of the pipeline, you may test the component latency according to the DeepStream SDK FAQ - Intelligent Video Analytics / DeepStream SDK - NVIDIA Developer Forums.

Also if i sum up the things.

If this is working as 2 PGIEs, then do you think the FPS mentioned on the top is correct?
Like if i run models independently , then FPS is above 150 and if i run as 2 PGIEs, then fps drops to 30?

It is hard to say it is correct or not. The application works in asynchronized way, it is not the simple linear relationship.

correct me if i am wrong.
Its process-mode=2 right ? not process-model.

Also i have tested with this. output FPS is still 30.

Yes. It is “process-mode”

Ok.

So if i conclude, the pipeline runs like-
prepare image-> do primary inference-> do tracking-> prepare image for secondary gie-> do inference-> mux both → show.
If it so, then i can conclude FPS may drop.

Because if it is concurrent, then there is hardly 2~3 ms of delay is added by secondary detector which does not effect much.

I tried-

  1. export NVDS_ENABLE_COMPONENT_LATENCY_MEASUREMENT=1
    export NVDS_ENABLE_LATENCY_MEASUREMENT=1

but log is-
Batch meta not found for buffer 0x7f33ec0655a0
BATCH-NUM = 340**
Batch meta not found for buffer 0x7f33d80393f0
BATCH-NUM = 341**
Batch meta not found for buffer 0x7f33d80391d0
BATCH-NUM = 342**
Batch meta not found for buffer 0x7f33ec0655a0

Ok i am able to see-
The logs are-

BATCH-NUM = 124**
Comp name = nvv4l2decoder3 in_system_timestamp = 1623147424852.312012 out_system_timestamp = 1623147425008.355957 component latency= 156.043945
Comp name = src_bin_muxer source_id = 0 pad_index = 0 frame_num = 124 in_system_timestamp = 1623147425008.459961 out_system_timestamp = 1623147425033.969971 component_latency = 25.510010
Comp name = nvv4l2decoder1 in_system_timestamp = 1623147424852.294922 out_system_timestamp = 1623147425008.497070 component latency= 156.202148
Comp name = src_bin_muxer source_id = 1 pad_index = 1 frame_num = 124 in_system_timestamp = 1623147425008.537109 out_system_timestamp = 1623147425033.969971 component_latency = 25.432861
Comp name = nvv4l2decoder0 in_system_timestamp = 1623147424852.372070 out_system_timestamp = 1623147425008.332031 component latency= 155.959961
Comp name = src_bin_muxer source_id = 2 pad_index = 2 frame_num = 124 in_system_timestamp = 1623147425008.378906 out_system_timestamp = 1623147425033.969971 component_latency = 25.591064
Comp name = nvv4l2decoder2 in_system_timestamp = 1623147424852.584961 out_system_timestamp = 1623147425008.678955 component latency= 156.093994
Comp name = src_bin_muxer source_id = 3 pad_index = 3 frame_num = 124 in_system_timestamp = 1623147425008.724121 out_system_timestamp = 1623147425033.970947 component_latency = 25.246826
Comp name = primary_gie in_system_timestamp = 1623147425034.010010 out_system_timestamp = 1623147425050.318115 component latency= 16.308105
Comp name = secondary_gie_0 in_system_timestamp = 1623147425089.768066 out_system_timestamp = 1623147425155.408936 component latency= 65.640869
Comp name = tiled_display_tiler in_system_timestamp = 1623147425155.537109 out_system_timestamp = 1623147425173.820068 component latency= 18.282959
Comp name = osd_conv in_system_timestamp = 1623147425173.997070 out_system_timestamp = 1623147425175.420898 component latency= 1.423828
Comp name = nvosd0 in_system_timestamp = 1623147425175.499023 out_system_timestamp = 1623147425179.126953 component latency= 3.627930
Source id = 0 Frame_num = 124 Frame latency = 326.924072 (ms)
Source id = 1 Frame_num = 124 Frame latency = 326.941162 (ms)
Source id = 2 Frame_num = 124 Frame latency = 326.864014 (ms)
Source id = 3 Frame_num = 124 Frame latency = 326.651123 (ms)

Seems the buffers stay in nvv4l2decoder component most. You may try the method in Troubleshooting — DeepStream 6.1.1 Release documentation to improve the performance.