An understanding of the delay result produced by latency_measurement_buf_probe

hi,Fiona

  1. I have modified the width and height of streammux to 640, and the latency of streammux to 40ms. Is this result correct?(test3-app)
  2. I noticed that NVINFER has also made changes to the image size. Why is the overall latency of the Infer only about 30ms?

If you are using deepstream-test3 to test the performance,please set it to the performance model by “export NVDS_TEST3_PERF_MODE=1”.

We tried "export NVDS_TEST3_PERF_MODE=1"in test3, but the result was still around 40ms.(set streammux to 604*640)

Have you got the GPU monitor data by “tegrastats” when you run the test3 sample with performance mode?

hi, Fiona

  1. Our usage mode is to set export NVDS-TEST3-PERF_MODE to 1.
  2. If we want to transform the image scale and format after nvv4l2deconder and nvstreammux, what do you recommend?

Please use “tegrastats” command to get the GPU usage log when you run test3 app.

Yes. I have known this. Please measure the GPU usage with the above command.

I don’t understand your request. What is the purpose of “transform the image scale and format after nvv4l2deconder and nvstreammux”? Do you mean after the video decoder, you want to handle the video frames in some designated resolution in the downstream plugins than to use the video frames as their original resolution?

What is the purpose of sacling and format coversion after the nvstreammux?

I don’t understand your request. What is the purpose of “transform the image scale and format after nvv4l2deconder and nvstreammux”? Do you mean after the video decoder, you want to handle the video frames in some designated resolution in the downstream plugins than to use the video frames as their original resolution?
What is the purpose of sacling and format coversion after the nvstreammux?

Yes, We want to use frames with specified resolution and format in downstream plugins. We also consider obtaining data from GPU memory after “nvstreammux” and processing it using methods other than plugins.

nvstreammux can only convert the frame resolutions. You may need to convert the format and resolution with nvvideoconvert after the nvstreammux. nvvideoconvert can convert the batched frames.

If your input streams are in the same resolution, we will suggest you to set the nvstreammux width and height as the same to the original video’s resolution to reduce the duplicated scaling.

Please use “tegrastats” command to get the GPU usage log when you run test3 app.

When running test3, we execute “tegrastats” and get the following log:

12-23-2024 17:07:18 RAM 8646/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [28%@2188,15%@2188,18%@2188,17%@2188,20%@729,15%@729,18%@729,18%@729] EMC_FREQ 0% GR3D_FREQ 5% CV0@-256C CPU@40.093C Tboard@30C SOC2@37.343C Tdiode@30.5C SOC0@37.562C CV1@-256C GPU@36.406C tj@40.343C SOC1@37.218C CV2@-256C
12-23-2024 17:07:19 RAM 8647/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [27%@2188,16%@2188,17%@2188,16%@2188,18%@729,18%@729,16%@729,24%@729] EMC_FREQ 0% GR3D_FREQ 35% CV0@-256C CPU@40.312C Tboard@30C SOC2@37.406C Tdiode@30.5C SOC0@37.625C CV1@-256C GPU@35.937C tj@40.062C SOC1@37.25C CV2@-256C
12-23-2024 17:07:20 RAM 8650/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [26%@729,15%@729,22%@729,24%@729,22%@729,25%@729,16%@729,23%@729] EMC_FREQ 0% GR3D_FREQ 53% CV0@-256C CPU@40.187C Tboard@30C SOC2@37.468C Tdiode@30.5C SOC0@37.875C CV1@-256C GPU@36.531C tj@40.25C SOC1@37.375C CV2@-256C
12-23-2024 17:07:21 RAM 8650/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [24%@729,12%@729,15%@729,15%@729,20%@729,26%@729,20%@729,20%@729] EMC_FREQ 0% GR3D_FREQ 0% CV0@-256C CPU@40.156C Tboard@30C SOC2@37.343C Tdiode@30.5C SOC0@37.593C CV1@-256C GPU@36.5C tj@40.156C SOC1@37.406C CV2@-256C
12-23-2024 17:07:22 RAM 8650/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [28%@729,14%@729,17%@729,17%@729,19%@729,21%@729,23%@729,15%@729] EMC_FREQ 0% GR3D_FREQ 81% CV0@-256C CPU@40.281C Tboard@30C SOC2@37.406C Tdiode@30.5C SOC0@37.656C CV1@-256C GPU@36.531C tj@40.281C SOC1@37.25C CV2@-256C
12-23-2024 17:07:23 RAM 8651/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [27%@2188,21%@2188,21%@2188,18%@2188,20%@729,23%@729,17%@729,19%@729] EMC_FREQ 0% GR3D_FREQ 89% CV0@-256C CPU@40.375C Tboard@30C SOC2@37.375C Tdiode@30.5C SOC0@37.656C CV1@-256C GPU@36.281C tj@40.375C SOC1@37.218C CV2@-256C
12-23-2024 17:07:24 RAM 8651/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [25%@1036,16%@1036,18%@1036,16%@1036,18%@729,20%@729,19%@729,15%@729] EMC_FREQ 0% GR3D_FREQ 60% CV0@-256C CPU@40.312C Tboard@30C SOC2@37.5C Tdiode@30.5C SOC0@37.718C CV1@-256C GPU@36.312C tj@40.187C SOC1@37.281C CV2@-256C
12-23-2024 17:07:26 RAM 8651/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [26%@1267,16%@1267,15%@1267,16%@1267,19%@729,20%@729,22%@729,21%@729] EMC_FREQ 0% GR3D_FREQ 33% CV0@-256C CPU@40.218C Tboard@30C SOC2@37.343C Tdiode@30.5C SOC0@37.625C CV1@-256C GPU@36.312C tj@40.312C SOC1@37.312C CV2@-256C
12-23-2024 17:07:27 RAM 8650/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [26%@729,20%@729,22%@729,19%@729,22%@729,20%@729,19%@729,21%@729] EMC_FREQ 0% GR3D_FREQ 62% CV0@-256C CPU@40.218C Tboard@30C SOC2@37.468C Tdiode@30.5C SOC0@37.593C CV1@-256C GPU@36.25C tj@40.375C SOC1@37.218C CV2@-256C
12-23-2024 17:07:28 RAM 8650/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [25%@883,15%@883,15%@883,16%@883,23%@729,22%@729,22%@729,22%@729] EMC_FREQ 0% GR3D_FREQ 29% CV0@-256C CPU@40.187C Tboard@30C SOC2@37.406C Tdiode@30.5C SOC0@37.656C CV1@-256C GPU@36.031C tj@40.062C SOC1@37.281C CV2@-256C
12-23-2024 17:07:29 RAM 8650/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [27%@729,16%@729,18%@729,19%@729,22%@729,18%@729,15%@729,17%@729] EMC_FREQ 0% GR3D_FREQ 65% CV0@-256C CPU@40.531C Tboard@30C SOC2@37.437C Tdiode@30.5C SOC0@37.562C CV1@-256C GPU@36.25C tj@40.375C SOC1@37.187C CV2@-256C
12-23-2024 17:07:30 RAM 8650/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [29%@2188,17%@2188,19%@2188,20%@2188,19%@729,17%@729,18%@729,19%@729] EMC_FREQ 0% GR3D_FREQ 68% CV0@-256C CPU@40.25C Tboard@30C SOC2@37.437C Tdiode@30.5C SOC0@37.593C CV1@-256C GPU@36.312C tj@40.312C SOC1@37.437C CV2@-256C
12-23-2024 17:07:31 RAM 8652/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [25%@1036,21%@1036,21%@1036,20%@1036,18%@729,16%@729,12%@729,17%@729] EMC_FREQ 0% GR3D_FREQ 99% CV0@-256C CPU@40.343C Tboard@30C SOC2@37.468C Tdiode@30.5C SOC0@37.75C CV1@-256C GPU@36.5C tj@40.343C SOC1@37.218C CV2@-256C
12-23-2024 17:07:32 RAM 8653/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [27%@883,16%@883,17%@883,14%@883,21%@729,19%@729,18%@729,12%@729] EMC_FREQ 0% GR3D_FREQ 33% CV0@-256C CPU@40.343C Tboard@30C SOC2@37.437C Tdiode@30.5C SOC0@37.656C CV1@-256C GPU@36.093C tj@40.343C SOC1@37.281C CV2@-256C
12-23-2024 17:07:33 RAM 8654/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [29%@2188,22%@2188,20%@2188,20%@2188,16%@729,20%@729,17%@729,19%@729] EMC_FREQ 0% GR3D_FREQ 55% CV0@-256C CPU@40.281C Tboard@30C SOC2@37.437C Tdiode@30.5C SOC0@37.843C CV1@-256C GPU@36.156C tj@40.281C SOC1@37.406C CV2@-256C
12-23-2024 17:07:34 RAM 8654/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [27%@729,16%@729,20%@729,17%@729,13%@729,17%@729,17%@729,11%@729] EMC_FREQ 0% GR3D_FREQ 52% CV0@-256C CPU@40.218C Tboard@30C SOC2@37.468C Tdiode@30.5C SOC0@37.656C CV1@-256C GPU@36.312C tj@40.218C SOC1@37.25C CV2@-256C
12-23-2024 17:07:35 RAM 8654/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [29%@729,19%@729,20%@729,19%@729,19%@729,23%@729,21%@729,22%@729] EMC_FREQ 0% GR3D_FREQ 45% CV0@-256C CPU@40.375C Tboard@30C SOC2@37.406C Tdiode@30.75C SOC0@37.656C CV1@-256C GPU@36.375C tj@40.375C SOC1@37.218C CV2@-256C
12-23-2024 17:07:36 RAM 8655/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [28%@2188,18%@2188,23%@2188,14%@2188,21%@729,15%@729,20%@729,21%@729] EMC_FREQ 0% GR3D_FREQ 5% CV0@-256C CPU@40.531C Tboard@30C SOC2@37.5C Tdiode@30.5C SOC0@37.687C CV1@-256C GPU@36.281C tj@40.531C SOC1@37.375C CV2@-256C
12-23-2024 17:07:37 RAM 8654/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [23%@2188,13%@2188,14%@2188,15%@2188,29%@806,20%@806,25%@806,25%@806] EMC_FREQ 0% GR3D_FREQ 5% CV0@-256C CPU@40.218C Tboard@30C SOC2@37.406C Tdiode@30.5C SOC0@37.812C CV1@-256C GPU@36.062C tj@40.218C SOC1@37.375C CV2@-256C
12-23-2024 17:07:38 RAM 8654/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [20%@729,14%@729,16%@729,13%@729,22%@729,19%@729,17%@729,18%@729] EMC_FREQ 0% GR3D_FREQ 36% CV0@-256C CPU@40.156C Tboard@30C SOC2@37.406C Tdiode@30.5C SOC0@37.687C CV1@-256C GPU@36.156C tj@40.312C SOC1@37.218C CV2@-256C
12-23-2024 17:07:39 RAM 8654/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [32%@960,18%@960,17%@960,18%@960,17%@729,19%@729,12%@729,18%@729] EMC_FREQ 0% GR3D_FREQ 0% CV0@-256C CPU@40.312C Tboard@30C SOC2@37.468C Tdiode@30.5C SOC0@37.593C CV1@-256C GPU@36.062C tj@40.5C SOC1@37.281C CV2@-256C
12-23-2024 17:07:40 RAM 8654/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [25%@2188,14%@2188,15%@2188,15%@2188,22%@729,19%@729,20%@729,21%@729] EMC_FREQ 0% GR3D_FREQ 30% CV0@-256C CPU@40.406C Tboard@30C SOC2@37.406C Tdiode@30.5C SOC0@37.687C CV1@-256C GPU@36.187C tj@40.406C SOC1@37.312C CV2@-256C
12-23-2024 17:07:41 RAM 8654/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [24%@2188,13%@2188,12%@2188,17%@2188,20%@729,17%@729,20%@729,22%@729] EMC_FREQ 0% GR3D_FREQ 0% CV0@-256C CPU@40.312C Tboard@30C SOC2@37.343C Tdiode@30.75C SOC0@37.718C CV1@-256C GPU@36.093C tj@40.5C SOC1@37.281C CV2@-256C
12-23-2024 17:07:42 RAM 8655/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [24%@806,17%@806,18%@806,13%@806,15%@729,15%@729,22%@729,23%@729] EMC_FREQ 0% GR3D_FREQ 30% CV0@-256C CPU@40.468C Tboard@30C SOC2@37.5C Tdiode@30.75C SOC0@37.75C CV1@-256C GPU@36.093C tj@40.468C SOC1@37.25C CV2@-256C
12-23-2024 17:07:43 RAM 8655/30537MB (lfb 3895x4MB) SWAP 0/15269MB (cached 0MB) CPU [26%@2188,19%@2188,16%@2188,16%@2188,20%@729,19%@729,19%@729,16%@729] EMC_FREQ 0% GR3D_FREQ 0% CV0@-256C CPU@40.125C Tboard@30C SOC2@37.406C Tdiode@30.5C SOC0@37.812C CV1@-256C GPU@36.093C tj@40.125C SOC1@37.343C CV2@-256C

What is the inputs? Local files or rtspstreams?

The input is 10 network camera RTSP streams.

So, it is OK to have such data. No one can guarantee the rtsp streams payloads arrived homogenously and in time. If your RTSP server supports TCP, you may enable tcp protocol by setting “protocols” as GST_RTSP_LOWER_TRANS_TCP and set larger “latency” value for rtspsrc . This may make the RTSP stream be handled more homogenously while the side effect is to introduce more end-to-end latency.

thanks Fiona,
I understand this question, but why is the delay of adding resolution conversion to “nvstreammux” higher than that of “nvinfer” (because I have seen that there is also resolution conversion and even format conversion in nvinfer)

Which delay do you refer to?

Run test3 to test the latency of “nvstreammux” and “nvinfer” using the latency_measurement-buf_deb method

The latency measurement with the live stream inputs may not reflect the real processing time. Unless you can guarantee your RTSP server can make the frames reach to the client in order homogenously and in time.

Thanks Fiona

hi,Fiona,
We use a nvv4l2decoder+nvstreammux+nvvideoconvert+capsfilter+fakesink pipeline to connect 10 camera RTSP streams and test the latency of each plugin using the latency_measurement-buf_deb method. The nvv4l2decoder latency is 39ms, the nvstreammux latency is 80ms, and the nvvideoconvert (scale and format conversion) latency is 39ms. The total latency of the three plugins is 163ms. Are these latency reasonable?

our purpose:

  1. We want to use frames with specified resolution and format in downstream plugins.
  2. We also consider obtaining data from GPU memory after “nvstreammux” and processing it using methods other than plugins.

The latency is just a pipeline status measurement, we can’t say it is reasonable or not reasonable. There are many factors which will impact the latency data, E.G. the network bandwidth and the transferring efficiency will also impact the downstream latency in the pipeline.

I think it has been discussed in An understanding of the delay result produced by latency_measurement_buf_probe - #29 by Fiona.Chen and An understanding of the delay result produced by latency_measurement_buf_probe - #30 by Fiona.Chen

You can get the GPU memory, we have provided the NvBufSurface interfaces. But it should be used very carefully to avoid the impact to the whole pipeline if you want to do the processing outside the pipeline.