Latency time at nvv4l2decoder about 250ms when running deepstream-app

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
Jetson nano devkit

• DeepStream Version
Deepstream 5.0 release version

• JetPack Version (valid for Jetson only)
JetPack4.4

• TensorRT Version
TensorRT: 7.1.0

• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs)
Question

• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)

deepstream_app -c ./source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt
with latency measurement enabled

I get the latency measurements like this:
BATCH-NUM = 70**
Comp name = nvv4l2decoder0 in_system_timestamp = 1603393881502.975098 out_system_timestamp = 1603393881702.209961 component latency= 199.234863
Comp name = src_bin_muxer source_id = 0 pad_index = 0 frame_num = 70 in_system_timestamp = 1603393881702.681885 out_system_timestamp = 1603393881702.812012 component_latency = 0.130127
Comp name = primary_gie in_system_timestamp = 1603393881703.507080 out_system_timestamp = 1603393881731.337891 component latency= 27.830811
Comp name = tracking_tracker in_system_timestamp = 1603393881731.370117 out_system_timestamp = 1603393881739.822998 component latency= 8.452881
Comp name = demuxer in_system_timestamp = 1603393881739.852051 out_system_timestamp = 1603393881739.962891 component latency= 0.110840
Comp name = osd_conv in_system_timestamp = 1603393881782.778076 out_system_timestamp = 1603393881785.332031 component latency= 2.553955
Comp name = nvosd0 in_system_timestamp = 1603393881785.522949 out_system_timestamp = 1603393881790.571045 component latency= 5.048096
Source id = 0 Frame_num = 70 Frame latency = 346.779785 (ms)

BATCH-NUM = 71**
Comp name = nvv4l2decoder0 in_system_timestamp = 1603393881486.231934 out_system_timestamp = 1603393881736.309082 component latency= 250.077148
Comp name = src_bin_muxer source_id = 0 pad_index = 0 frame_num = 71 in_system_timestamp = 1603393881736.524902 out_system_timestamp = 1603393881736.854980 component_latency = 0.330078
Comp name = primary_gie in_system_timestamp = 1603393881736.961914 out_system_timestamp = 1603393881753.534912 component latency= 16.572998
Comp name = tracking_tracker in_system_timestamp = 1603393881753.562988 out_system_timestamp = 1603393881761.364990 component latency= 7.802002
Comp name = demuxer in_system_timestamp = 1603393881761.406006 out_system_timestamp = 1603393881761.539062 component latency= 0.133057
Comp name = osd_conv in_system_timestamp = 1603393881799.250000 out_system_timestamp = 1603393881802.197021 component latency= 2.947021
Comp name = nvosd0 in_system_timestamp = 1603393881802.654053 out_system_timestamp = 1603393881807.612061 component latency= 4.958008
Source id = 0 Frame_num = 71 Frame latency = 379.805176 (ms)

Time spent on nvv4l2decoder0 is above 200ms, which is concerning.
I am using the power barrel and please find the jtop profile attached. I could not figure out where is wrong, so that the time spent at decoder is so long. Thank you very much.

jtop profile

Here is the config file used:
[application]
enable-perf-measurement=0
perf-measurement-interval-sec=5
#gie-kitti-output-dir=streamscl

[tiled-display]
enable=0
rows=2
columns=2
width=1280
height=720
gpu-id=0
nvbuf-memory-type=0

[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI 4=RTSP
type=2
uri=file://…/…/streams/sample_1080p_h264.mp4
gpu-id=0
cudadec-memtype=0

[sink0]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File
type=2
sync=0
source-id=0
gpu-id=0
nvbuf-memory-type=0

[osd]
enable=1
gpu-id=0
border-width=1
text-size=15
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Serif
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0

[streammux]
gpu-id=0
##Boolean property to inform muxer that sources are live
live-source=0
batch-size=1
batched-push-timeout=40000
width=1920
height=1080
enable-padding=0
nvbuf-memory-type=0

[primary-gie]
enable=1
gpu-id=0
model-engine-file=…/…/models/Primary_Detector_Nano/resnet10.caffemodel_b1_gpu0_fp16.engine
batch-size=1
bbox-border-color0=1;0;0;1
bbox-border-color1=0;1;1;1
bbox-border-color2=0;0;1;1
bbox-border-color3=0;1;0;1
interval=0
gie-unique-id=1
nvbuf-memory-type=0
config-file=config_infer_primary_nano.txt

[tracker]
enable=1
tracker-width=640
tracker-height=384
ll-lib-file=/opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_mot_klt.so
gpu-id=0
enable-batch-process=1
enable-past-frame=0
display-tracking-id=1

• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

The latency is OK. Latency does not mean the decoding speed. It is a combination result of the input buffer period, decoding speed, reorder, output consumption period. Have you checked the FPS statics of this configuration?

deepstream-app was running a mp4 file with sync = 0, not live source, batch_size = 1, no reorder needed.

**PERF: FPS 0 (Avg)
**PERF: 60.05 (59.86)
**PERF: 59.98 (59.97)
**PERF: 60.02 (59.98)

“input buffer period, decoding speed, reorder, output consumption period” takes 200ms, seems not correlated well with 60 performance FPS reported, a bit confused.

does it mean the latency measurement is not very accurate, we should look at the performance FPS instead.

BATCH-NUM = 17**
Comp name = nvv4l2decoder0 in_system_timestamp = 1603461561387.930908 out_system_timestamp = 1603461561661.123047 component latency= 273.192139
Comp name = src_bin_muxer source_id = 0 pad_index = 0 frame_num = 17 in_system_timestamp = 1603461561661.384033 out_system_timestamp = 1603461561661.688965 component_latency = 0.304932
Comp name = primary_gie in_system_timestamp = 1603461561663.040039 out_system_timestamp = 1603461561680.056885 component latency= 17.016846
Comp name = tracking_tracker in_system_timestamp = 1603461561680.110107 out_system_timestamp = 1603461561688.919922 component latency= 8.809814
Comp name = tiled_display_tiler in_system_timestamp = 1603461561723.388916 out_system_timestamp = 1603461561726.283936 component latency= 2.895020
Comp name = osd_conv in_system_timestamp = 1603461561771.874023 out_system_timestamp = 1603461561773.658936 component latency= 1.784912
Comp name = nvosd0 in_system_timestamp = 1603461561773.862061 out_system_timestamp = 1603461561777.965088 component latency= 4.103027
Source id = 0 Frame_num = 17 Frame latency = 450.603027 (ms)

Latency is mores suitable for loading analysis. For performance measurement, please refer to FPS.