Latency numbers per frame - yolov3 + 3 Resnet18's mismatch

Hi I want to understand how to correlate the latency numbers reported by enabling the flag “export enable component latency measurement = 1” for yolov3 as primary + 3 Resnet18 secondary classifiers (which comes with the ds4)

This is on T4 (deepstream4): I am running 1080p and getting ~70fps peak performance (no frame drops) which means the latency is 1/70 = 14ms.

But the latency reported using the per frame in unbelievably higher than acheived… may i know the reason for this. Below is result for 1 1080p10 source & 7 1080p10 sources.

1 1080p10 source :

BATCH-NUM = 180**
Comp name = nvv4l2decoder0 in_system_timestamp = 1569226560472.272949 out_system_timestamp = 1569226560522.977051 component latency= 50.704102
Comp name = src_bin_muxer source_id = 0 pad_index = 0 frame_num = 180 in_system_timestamp = 1569226560523.029053 out_system_timestamp = 1569226560528.311035 component_latency = 5.281982
Comp name = primary_gie_classifier in_system_timestamp = 1569226560528.447021 out_system_timestamp = 1569226560577.937988 component latency= 49.490967
Comp name = tracking_tracker in_system_timestamp = 1569226560577.945068 out_system_timestamp = 1569226560578.068115 component latency= 0.123047
Comp name = secondary_gie_0 in_system_timestamp = 1569226560578.114014 out_system_timestamp = 1569226560579.208984 component latency= 1.094971
Comp name = secondary_gie_1 in_system_timestamp = 1569226560578.132080 out_system_timestamp = 1569226560579.330078 component latency= 1.197998
Comp name = secondary_gie_2 in_system_timestamp = 1569226560578.135986 out_system_timestamp = 1569226560579.347900 component latency= 1.211914
Comp name = tiled_display_tiler in_system_timestamp = 1569226560580.324951 out_system_timestamp = 1569226560582.762939 component latency= 2.437988
Source id = 0 Frame_num = 180 Frame latency = 110.661133 (ms)

For 7 sources its
Comp name = nvv4l2decoder5 in_system_timestamp = 1569318821354.552002 out_system_timestamp = 1569318821610.374023 component latency= 255.822021
Comp name = src_bin_muxer source_id = 5 pad_index = 5 frame_num = 277 in_system_timestamp = 1569318821610.413086 out_system_timestamp = 1569318821697.624023 component_latency = 87.210938
Comp name = nvv4l2decoder4 in_system_timestamp = 1569318821354.995117 out_system_timestamp = 1569318821610.304932 component latency= 255.309814
Comp name = src_bin_muxer source_id = 6 pad_index = 6 frame_num = 277 in_system_timestamp = 1569318821610.350098 out_system_timestamp = 1569318821697.624023 component_latency = 87.273926
Comp name = nvv4l2decoder0 in_system_timestamp = 1569318821354.673096 out_system_timestamp = 1569318821610.544922 component latency= 255.871826
Comp name = src_bin_muxer source_id = 0 pad_index = 0 frame_num = 275 in_system_timestamp = 1569318821610.593994 out_system_timestamp = 1569318821697.624023 component_latency = 87.030029
Comp name = nvv4l2decoder1 in_system_timestamp = 1569318821354.647949 out_system_timestamp = 1569318821609.947998 component latency= 255.300049
Comp name = src_bin_muxer source_id = 1 pad_index = 1 frame_num = 277 in_system_timestamp = 1569318821609.997070 out_system_timestamp = 1569318821697.624023 component_latency = 87.626953
Comp name = nvv4l2decoder3 in_system_timestamp = 1569318821354.810059 out_system_timestamp = 1569318821610.497070 component latency= 255.687012
Comp name = src_bin_muxer source_id = 2 pad_index = 2 frame_num = 277 in_system_timestamp = 1569318821610.543945 out_system_timestamp = 1569318821697.624023 component_latency = 87.080078
Comp name = nvv4l2decoder2 in_system_timestamp = 1569318821354.523926 out_system_timestamp = 1569318821609.857910 component latency= 255.333984
Comp name = src_bin_muxer source_id = 3 pad_index = 3 frame_num = 277 in_system_timestamp = 1569318821609.902100 out_system_timestamp = 1569318821697.624023 component_latency = 87.721924
Comp name = nvv4l2decoder6 in_system_timestamp = 1569318821354.764893 out_system_timestamp = 1569318821617.280029 component latency= 262.515137
Comp name = src_bin_muxer source_id = 4 pad_index = 4 frame_num = 274 in_system_timestamp = 1569318821617.340088 out_system_timestamp = 1569318821697.624023 component_latency = 80.283936
Comp name = primary_gie_classifier in_system_timestamp = 1569318821697.684082 out_system_timestamp = 1569318821954.092041 component latency= 256.407959
Comp name = tracking_tracker in_system_timestamp = 1569318821954.103027 out_system_timestamp = 1569318821954.527100 component latency= 0.424072
Comp name = secondary_gie_0 in_system_timestamp = 1569318821954.585938 out_system_timestamp = 1569318821983.308105 component latency= 28.722168
Comp name = secondary_gie_1 in_system_timestamp = 1569318821954.590088 out_system_timestamp = 1569318821983.291016 component latency= 28.700928
Comp name = secondary_gie_2 in_system_timestamp = 1569318821954.601074 out_system_timestamp = 1569318821983.267090 component latency= 28.666016
Comp name = tiled_display_tiler in_system_timestamp = 1569318821984.364990 out_system_timestamp = 1569318821993.687012 component latency= 9.322021
Source id = 5 Frame_num = 277 Frame latency = 639.427979 (ms)
Source id = 6 Frame_num = 277 Frame latency = 638.984863 (ms)
Source id = 0 Frame_num = 275 Frame latency = 639.306885 (ms)
Source id = 1 Frame_num = 277 Frame latency = 639.332031 (ms)
Source id = 2 Frame_num = 277 Frame latency = 639.169922 (ms)
Source id = 3 Frame_num = 277 Frame latency = 639.456055 (ms)
Source id = 4 Frame_num = 274 Frame latency = 639.215088 (ms)
**PERF: 11.76 (11.34) 11.76 (11.32) 11.76 (11.32) 11.76 (11.32) 11.76 (11.37) 11.76 (11.32) 11.76 (11.32)

How is 14ms comparable with the 639ms?? Is it because of parallaelism? or are multiple yolov3’s are being instantiated here??

Thanks in advance

Hi,

I am running 1080p and getting ~70fps peak performance (no frame drops) which means the latency is 1/70 = 14ms.

Can you specify how you got those numbers ?

Hi, if i send the 1080p10 videos into deepstream4.0 sdk with configuration : yolov3 as primary (batch 7) and resnet18 (batch 16) as secondary and iou tracker - i see that beyond 7 sources, the fps reduces to less than input fps i.e. 10fps. so i determine that 70fps is the peak performance if i have to do synchronous video + ml.

Thanks.

Hi,

You may be reaching max throughput at 70fps but doesn’t imply that the latency is 14ms. All operations of decoding, muxing, infer, tracking etc are pipelined and the numbers you are seeing in the log is what each operation is taking. 640ms you are seeing in the logs is what it takes for the frame to travel through the entire pipeline.