Latency measurement approaches

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) Jetson
• DeepStream Version 6.2
• JetPack Version (valid for Jetson only) 5.1.1
• Issue Type( questions, new requirements, bugs) question

Hello, I am trying to compute latencies of each element of the pipeline. I am creating the DeepStream pipeline via python. Currently, I am aware about a few approaches

  1. Utilize NVDS_ENABLE_COMPONENT_LATENCY_MEASUREMENT="1" and NVDS_ENABLE_LATENCY_MEASUREMENT="1" environment variables along with the probe to access NvDsFrameLatencyInfo as mentioned here. I get the output as shown below

  2. Create custom probes on the “sink” and “src” of a plugin to get the plugin’s overall delay. I get the output as shown below

I am able to get latency information via approach 1 and approach 2 and I’m observing similar latency information via both approaches. However, I am unable to control the output in Python for approach 1 since it prints the component wise output within nvds_measure_buffer_latency which I do not have access to.

Questions

  1. Would there be any major downside/inaccuracies to utilizing custom probes over approach 1?

  2. Where/When does the NvDsMetaCompLatency get populated? Would setting the two environment variables ensure that it will get populated internally? If that’s true, I would need to create the python bindings for NvDsMetaCompLatency, set env variables and then access the information like this? Or does nvds_measure_buffer_latency still have to be invoked?

  3. For approach 1, as observed in the screenshot, the frame numbers are not changing (frame_num=0) for different batches. My current setup uses a CSI camera and not RTSP stream as source - is that the reason? This post mentions that latency measurement via nvds_measure_buffer_latency is mainly for an RTSP source.

  4. For approach 2, I believe this might not work for plugins which do not receive frame_meta_list from the batch_meta like the encoder or streammux. How can I compute latency in that case?

What you described above is the same approach. The latency measurement method used in python is just native bindings. latency measurement is implemented in C language.

Latency measurement only counts the time consumption of element’s transform_ip/transform member function, and the above environment variables are used as switches.

nvds_measure_buffer_latency is responsible for obtaining the above information.

This is inaccurate. Valid uris are OK. frame_num being 0 may be another problem.

Please refer to the answer to the second question

The latency measurement method used in python is just native bindings. latency measurement is implemented in C language.

Okay. I have also created the bindings for NvDsMetaCompLatency by following the steps in BINDINGSGUIDE.md and created a probe similar to the one mentioned here. I am able to access the components thru pyds. NvDsMetaCompLatency and print their latencies thru Python. These latencies match the component latency values which were printed when creating the bindings via cffi. However, my frame number is still 0 for both “bindings approaches” across batches. Please advise why this might be happening. It seems that in the underlying C-library might not be setting the frame number correctly?

Since I am using a single source currently, I could also extract the frame number value from NvDsBatchMeta.frame_meta_list.frame_num and use that instead of NvDsMetaCompLatency.frame_num. However, this will create an issue when I add more sources…

This post seems to describe the same behavior that I am observing. Is there a problem with using a v4l2 source w.r.t latency measurement?

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

Are you using nvmultistreamtiler in your pipeline?

It is a known issue that nvmultistreamtiler will clear frame_num.

You can count frame_num by yourself as a workaround.

If not, can you provide sample code that reproduces the problem to help me analyze the problem?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.