An understanding of the delay result produced by latency_measurement_buf_probe

orin_test · December 19, 2024, 10:17am

hi,Fiona

I have modified the width and height of streammux to 640, and the latency of streammux to 40ms. Is this result correct?(test3-app)
I noticed that NVINFER has also made changes to the image size. Why is the overall latency of the Infer only about 30ms?

Fiona.Chen · December 19, 2024, 11:04am

If you are using deepstream-test3 to test the performance，please set it to the performance model by “export NVDS_TEST3_PERF_MODE=1”.

orin_test · December 20, 2024, 12:58am

We tried "export NVDS_TEST3_PERF_MODE=1"in test3, but the result was still around 40ms.(set streammux to 604*640)

Fiona.Chen · December 20, 2024, 1:04am

Have you got the GPU monitor data by “tegrastats” when you run the test3 sample with performance mode?

orin_test · December 23, 2024, 8:33am

hi, Fiona

Our usage mode is to set export NVDS-TEST3-PERF_MODE to 1.
If we want to transform the image scale and format after nvv4l2deconder and nvstreammux, what do you recommend?

Fiona.Chen · December 23, 2024, 8:39am

Please use “tegrastats” command to get the GPU usage log when you run test3 app.

Yes. I have known this. Please measure the GPU usage with the above command.

I don’t understand your request. What is the purpose of “transform the image scale and format after nvv4l2deconder and nvstreammux”? Do you mean after the video decoder, you want to handle the video frames in some designated resolution in the downstream plugins than to use the video frames as their original resolution?

What is the purpose of sacling and format coversion after the nvstreammux?

orin_test · December 23, 2024, 8:54am

I don’t understand your request. What is the purpose of “transform the image scale and format after nvv4l2deconder and nvstreammux”? Do you mean after the video decoder, you want to handle the video frames in some designated resolution in the downstream plugins than to use the video frames as their original resolution?
What is the purpose of sacling and format coversion after the nvstreammux?

Yes, We want to use frames with specified resolution and format in downstream plugins. We also consider obtaining data from GPU memory after “nvstreammux” and processing it using methods other than plugins.

Fiona.Chen · December 23, 2024, 8:57am

nvstreammux can only convert the frame resolutions. You may need to convert the format and resolution with nvvideoconvert after the nvstreammux. nvvideoconvert can convert the batched frames.

Fiona.Chen · December 23, 2024, 8:59am

If your input streams are in the same resolution, we will suggest you to set the nvstreammux width and height as the same to the original video’s resolution to reduce the duplicated scaling.

orin_test · December 23, 2024, 9:12am

Please use “tegrastats” command to get the GPU usage log when you run test3 app.

When running test3, we execute “tegrastats” and get the following log:

12-23-2024 17:07:18 RAM 8646/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [28%@2188,15%@2188,18%@2188,17%@2188,20%@729,15%@729,18%@729,18%@729] EMC_FREQ 0% GR3D_FREQ 5% CV0@-256C CPU@40.093C Tboard@30C SOC2@37.343C Tdiode@30.5C SOC0@37.562C CV1@-256C GPU@36.406C tj@40.343C SOC1@37.218C CV2@-256C
12-23-2024 17:07:19 RAM 8647/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [27%@2188,16%@2188,17%@2188,16%@2188,18%@729,18%@729,16%@729,24%@729] EMC_FREQ 0% GR3D_FREQ 35% CV0@-256C CPU@40.312C Tboard@30C SOC2@37.406C Tdiode@30.5C SOC0@37.625C CV1@-256C GPU@35.937C tj@40.062C SOC1@37.25C CV2@-256C
12-23-2024 17:07:20 RAM 8650/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [26%@729,15%@729,22%@729,24%@729,22%@729,25%@729,16%@729,23%@729] EMC_FREQ 0% GR3D_FREQ 53% CV0@-256C CPU@40.187C Tboard@30C SOC2@37.468C Tdiode@30.5C SOC0@37.875C CV1@-256C GPU@36.531C tj@40.25C SOC1@37.375C CV2@-256C
12-23-2024 17:07:21 RAM 8650/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [24%@729,12%@729,15%@729,15%@729,20%@729,26%@729,20%@729,20%@729] EMC_FREQ 0% GR3D_FREQ 0% CV0@-256C CPU@40.156C Tboard@30C SOC2@37.343C Tdiode@30.5C SOC0@37.593C CV1@-256C GPU@36.5C tj@40.156C SOC1@37.406C CV2@-256C
12-23-2024 17:07:22 RAM 8650/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [28%@729,14%@729,17%@729,17%@729,19%@729,21%@729,23%@729,15%@729] EMC_FREQ 0% GR3D_FREQ 81% CV0@-256C CPU@40.281C Tboard@30C SOC2@37.406C Tdiode@30.5C SOC0@37.656C CV1@-256C GPU@36.531C tj@40.281C SOC1@37.25C CV2@-256C
12-23-2024 17:07:23 RAM 8651/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [27%@2188,21%@2188,21%@2188,18%@2188,20%@729,23%@729,17%@729,19%@729] EMC_FREQ 0% GR3D_FREQ 89% CV0@-256C CPU@40.375C Tboard@30C SOC2@37.375C Tdiode@30.5C SOC0@37.656C CV1@-256C GPU@36.281C tj@40.375C SOC1@37.218C CV2@-256C
12-23-2024 17:07:24 RAM 8651/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [25%@1036,16%@1036,18%@1036,16%@1036,18%@729,20%@729,19%@729,15%@729] EMC_FREQ 0% GR3D_FREQ 60% CV0@-256C CPU@40.312C Tboard@30C SOC2@37.5C Tdiode@30.5C SOC0@37.718C CV1@-256C GPU@36.312C tj@40.187C SOC1@37.281C CV2@-256C
12-23-2024 17:07:26 RAM 8651/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [26%@1267,16%@1267,15%@1267,16%@1267,19%@729,20%@729,22%@729,21%@729] EMC_FREQ 0% GR3D_FREQ 33% CV0@-256C CPU@40.218C Tboard@30C SOC2@37.343C Tdiode@30.5C SOC0@37.625C CV1@-256C GPU@36.312C tj@40.312C SOC1@37.312C CV2@-256C
12-23-2024 17:07:27 RAM 8650/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [26%@729,20%@729,22%@729,19%@729,22%@729,20%@729,19%@729,21%@729] EMC_FREQ 0% GR3D_FREQ 62% CV0@-256C CPU@40.218C Tboard@30C SOC2@37.468C Tdiode@30.5C SOC0@37.593C CV1@-256C GPU@36.25C tj@40.375C SOC1@37.218C CV2@-256C
12-23-2024 17:07:28 RAM 8650/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [25%@883,15%@883,15%@883,16%@883,23%@729,22%@729,22%@729,22%@729] EMC_FREQ 0% GR3D_FREQ 29% CV0@-256C CPU@40.187C Tboard@30C SOC2@37.406C Tdiode@30.5C SOC0@37.656C CV1@-256C GPU@36.031C tj@40.062C SOC1@37.281C CV2@-256C
12-23-2024 17:07:29 RAM 8650/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [27%@729,16%@729,18%@729,19%@729,22%@729,18%@729,15%@729,17%@729] EMC_FREQ 0% GR3D_FREQ 65% CV0@-256C CPU@40.531C Tboard@30C SOC2@37.437C Tdiode@30.5C SOC0@37.562C CV1@-256C GPU@36.25C tj@40.375C SOC1@37.187C CV2@-256C
12-23-2024 17:07:30 RAM 8650/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [29%@2188,17%@2188,19%@2188,20%@2188,19%@729,17%@729,18%@729,19%@729] EMC_FREQ 0% GR3D_FREQ 68% CV0@-256C CPU@40.25C Tboard@30C SOC2@37.437C Tdiode@30.5C SOC0@37.593C CV1@-256C GPU@36.312C tj@40.312C SOC1@37.437C CV2@-256C
12-23-2024 17:07:31 RAM 8652/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [25%@1036,21%@1036,21%@1036,20%@1036,18%@729,16%@729,12%@729,17%@729] EMC_FREQ 0% GR3D_FREQ 99% CV0@-256C CPU@40.343C Tboard@30C SOC2@37.468C Tdiode@30.5C SOC0@37.75C CV1@-256C GPU@36.5C tj@40.343C SOC1@37.218C CV2@-256C
12-23-2024 17:07:32 RAM 8653/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [27%@883,16%@883,17%@883,14%@883,21%@729,19%@729,18%@729,12%@729] EMC_FREQ 0% GR3D_FREQ 33% CV0@-256C CPU@40.343C Tboard@30C SOC2@37.437C Tdiode@30.5C SOC0@37.656C CV1@-256C GPU@36.093C tj@40.343C SOC1@37.281C CV2@-256C
12-23-2024 17:07:33 RAM 8654/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [29%@2188,22%@2188,20%@2188,20%@2188,16%@729,20%@729,17%@729,19%@729] EMC_FREQ 0% GR3D_FREQ 55% CV0@-256C CPU@40.281C Tboard@30C SOC2@37.437C Tdiode@30.5C SOC0@37.843C CV1@-256C GPU@36.156C tj@40.281C SOC1@37.406C CV2@-256C
12-23-2024 17:07:34 RAM 8654/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [27%@729,16%@729,20%@729,17%@729,13%@729,17%@729,17%@729,11%@729] EMC_FREQ 0% GR3D_FREQ 52% CV0@-256C CPU@40.218C Tboard@30C SOC2@37.468C Tdiode@30.5C SOC0@37.656C CV1@-256C GPU@36.312C tj@40.218C SOC1@37.25C CV2@-256C
12-23-2024 17:07:35 RAM 8654/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [29%@729,19%@729,20%@729,19%@729,19%@729,23%@729,21%@729,22%@729] EMC_FREQ 0% GR3D_FREQ 45% CV0@-256C CPU@40.375C Tboard@30C SOC2@37.406C Tdiode@30.75C SOC0@37.656C CV1@-256C GPU@36.375C tj@40.375C SOC1@37.218C CV2@-256C
12-23-2024 17:07:36 RAM 8655/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [28%@2188,18%@2188,23%@2188,14%@2188,21%@729,15%@729,20%@729,21%@729] EMC_FREQ 0% GR3D_FREQ 5% CV0@-256C CPU@40.531C Tboard@30C SOC2@37.5C Tdiode@30.5C SOC0@37.687C CV1@-256C GPU@36.281C tj@40.531C SOC1@37.375C CV2@-256C
12-23-2024 17:07:37 RAM 8654/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [23%@2188,13%@2188,14%@2188,15%@2188,29%@806,20%@806,25%@806,25%@806] EMC_FREQ 0% GR3D_FREQ 5% CV0@-256C CPU@40.218C Tboard@30C SOC2@37.406C Tdiode@30.5C SOC0@37.812C CV1@-256C GPU@36.062C tj@40.218C SOC1@37.375C CV2@-256C
12-23-2024 17:07:38 RAM 8654/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [20%@729,14%@729,16%@729,13%@729,22%@729,19%@729,17%@729,18%@729] EMC_FREQ 0% GR3D_FREQ 36% CV0@-256C CPU@40.156C Tboard@30C SOC2@37.406C Tdiode@30.5C SOC0@37.687C CV1@-256C GPU@36.156C tj@40.312C SOC1@37.218C CV2@-256C
12-23-2024 17:07:39 RAM 8654/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [32%@960,18%@960,17%@960,18%@960,17%@729,19%@729,12%@729,18%@729] EMC_FREQ 0% GR3D_FREQ 0% CV0@-256C CPU@40.312C Tboard@30C SOC2@37.468C Tdiode@30.5C SOC0@37.593C CV1@-256C GPU@36.062C tj@40.5C SOC1@37.281C CV2@-256C
12-23-2024 17:07:40 RAM 8654/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [25%@2188,14%@2188,15%@2188,15%@2188,22%@729,19%@729,20%@729,21%@729] EMC_FREQ 0% GR3D_FREQ 30% CV0@-256C CPU@40.406C Tboard@30C SOC2@37.406C Tdiode@30.5C SOC0@37.687C CV1@-256C GPU@36.187C tj@40.406C SOC1@37.312C CV2@-256C
12-23-2024 17:07:41 RAM 8654/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [24%@2188,13%@2188,12%@2188,17%@2188,20%@729,17%@729,20%@729,22%@729] EMC_FREQ 0% GR3D_FREQ 0% CV0@-256C CPU@40.312C Tboard@30C SOC2@37.343C Tdiode@30.75C SOC0@37.718C CV1@-256C GPU@36.093C tj@40.5C SOC1@37.281C CV2@-256C
12-23-2024 17:07:42 RAM 8655/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [24%@806,17%@806,18%@806,13%@806,15%@729,15%@729,22%@729,23%@729] EMC_FREQ 0% GR3D_FREQ 30% CV0@-256C CPU@40.468C Tboard@30C SOC2@37.5C Tdiode@30.75C SOC0@37.75C CV1@-256C GPU@36.093C tj@40.468C SOC1@37.25C CV2@-256C
12-23-2024 17:07:43 RAM 8655/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [26%@2188,19%@2188,16%@2188,16%@2188,20%@729,19%@729,19%@729,16%@729] EMC_FREQ 0% GR3D_FREQ 0% CV0@-256C CPU@40.125C Tboard@30C SOC2@37.406C Tdiode@30.5C SOC0@37.812C CV1@-256C GPU@36.093C tj@40.125C SOC1@37.343C CV2@-256C

Fiona.Chen · December 23, 2024, 9:19am

orin_test:

[25%@883,15%@883,15%@883,16%@883,23%@729,22%@729,22%@729,22%@729] EMC_FREQ 0% GR3D_FREQ 29% CV0@-256C CPU@40.187C Tboard@30C SOC2@37.406C Tdiode@30.5C SOC0@37.656C CV1@-256C GPU@36.031C tj@40.062C SOC1@37.281C CV2@-256C
12-23-2024 17:07:29 RAM 8650/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [27%@729,16%@729,18%@729,19%@729,22%@729,18%@729,15%@729,17%@729] EMC_FREQ 0% GR3D_FREQ 65% CV0@-256C CPU@40.531C Tboard@30C SOC2@37.437C Tdiode@30.5C SOC0@37.562C CV1@-256C GPU@36.25C tj@40.375C SOC1@37.187C CV2@-256C
12-23-2024 17:07:30 RAM 8650/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [29%@2188,17%@2188,19%@2188,20%@2188,19%@729,17%@729,18%@729,19%@729] EMC_FREQ 0% GR3D_FREQ 68% CV0@-256C CPU@40.25C Tboard@30C SOC2@37.437C Tdiode@30.5C SOC0@37.593C CV1@-256C GPU@36.312C tj@40.312C SOC1@37.437C CV2@-256C
12-23-2024 17:07:31 RAM 8652/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [25%@1036,21%@1036,21%@1036,20%@1036,18%@729,16%@729,12%@729,17%@729] EMC_FREQ 0% GR3D_FREQ 99% CV0@-256C CPU@40.343C Tboard@30C SOC2@37.468C Tdiode@30.5C SOC0@37.75C CV1@-256C GPU@36.5C tj@40.343C SOC1@37.218C

What is the inputs? Local files or rtspstreams?

orin_test · December 23, 2024, 9:23am

The input is 10 network camera RTSP streams.

Fiona.Chen · December 23, 2024, 9:33am

So, it is OK to have such data. No one can guarantee the rtsp streams payloads arrived homogenously and in time. If your RTSP server supports TCP, you may enable tcp protocol by setting “protocols” as GST_RTSP_LOWER_TRANS_TCP and set larger “latency” value for rtspsrc . This may make the RTSP stream be handled more homogenously while the side effect is to introduce more end-to-end latency.

orin_test · December 23, 2024, 9:42am

thanks Fiona,
I understand this question, but why is the delay of adding resolution conversion to “nvstreammux” higher than that of “nvinfer” (because I have seen that there is also resolution conversion and even format conversion in nvinfer)

Fiona.Chen · December 23, 2024, 9:46am

Which delay do you refer to?

orin_test · December 23, 2024, 9:57am

Run test3 to test the latency of “nvstreammux” and “nvinfer” using the latency_measurement-buf_deb method

Fiona.Chen · December 23, 2024, 10:05am

The latency measurement with the live stream inputs may not reflect the real processing time. Unless you can guarantee your RTSP server can make the frames reach to the client in order homogenously and in time.

orin_test · December 23, 2024, 10:13am

Thanks Fiona

orin_test · December 24, 2024, 1:58am

hi,Fiona,
We use a nvv4l2decoder+nvstreammux+nvvideoconvert+capsfilter+fakesink pipeline to connect 10 camera RTSP streams and test the latency of each plugin using the latency_measurement-buf_deb method. The nvv4l2decoder latency is 39ms, the nvstreammux latency is 80ms, and the nvvideoconvert (scale and format conversion) latency is 39ms. The total latency of the three plugins is 163ms. Are these latency reasonable?

our purpose:

We want to use frames with specified resolution and format in downstream plugins.
We also consider obtaining data from GPU memory after “nvstreammux” and processing it using methods other than plugins.

Fiona.Chen · December 24, 2024, 2:37am

The latency is just a pipeline status measurement, we can’t say it is reasonable or not reasonable. There are many factors which will impact the latency data, E.G. the network bandwidth and the transferring efficiency will also impact the downstream latency in the pipeline.

I think it has been discussed in An understanding of the delay result produced by latency_measurement_buf_probe - #29 by Fiona.Chen and An understanding of the delay result produced by latency_measurement_buf_probe - #30 by Fiona.Chen

You can get the GPU memory, we have provided the NvBufSurface interfaces. But it should be used very carefully to avoid the impact to the whole pipeline if you want to do the processing outside the pipeline.

Topic		Replies	Views
RTSP latency does not work with NVSTREAMMUX DeepStream SDK nvbugs	37	5329	January 23, 2022
Slow decoding T4 DS 5.0.1 DeepStream SDK	11	513	October 12, 2021
Deepstream-app latency increasing DeepStream SDK	12	2629	October 12, 2021
Latency measurement issue DeepStream SDK	4	1257	October 12, 2021
Delay, randomness and dropped frames in RTSP output Stream DeepStream SDK	22	3901	October 12, 2021
How to messure actual nvv4l2decoder0 latancy? DeepStream SDK	6	226	July 13, 2023
Latency measurement (nvds_measure_buffer_latency) gave weird results DeepStream SDK	2	1517	October 12, 2021
Deepstream multiple rtsp output latency DeepStream SDK	2	665	November 10, 2022
Deepstream RTSP Latency issue causing deepstream to halt execution DeepStream SDK rtsp	8	1051	July 29, 2021
The most efficient method to evaluate time each plugin (in DeepStream)cost? DeepStream SDK	8	1337	October 12, 2021

An understanding of the delay result produced by latency_measurement_buf_probe

Related topics