Latency measure for every frame

noir201 · January 7, 2024, 8:22am

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) Jetson Agx orin
• DeepStream Version 6.2

I have developed a detection and recognition pipeline which accepts a video as an input. The model is built over sample-3 python where primary & secondary nvinfer are used. Is there a way to measure processing time in ms for the primary and secondary for every frame?

junshengy · January 8, 2024, 2:39am

You can refer to this FAQ

noir201 · January 8, 2024, 5:36pm

Thank you for your answer.
Can you please clarify where the below lines can be added
export NVDS_ENABLE_COMPONENT_LATENCY_MEASUREMENT=1
export NVDS_ENABLE_LATENCY_MEASUREMENT=1

and can you please assist me using python coding samples of app-3

noir201 · January 9, 2024, 7:08pm

I have enabled both before running the python sample and it gave only Encode latency after every frame.
Frame Number=758 Object Count=3
Encode Latency = 15.592041
i didn’t get latency for other plugins or the full pipeline

junshengy · January 10, 2024, 10:31am

1.pip install cffi

2.apply the following patch to deepstream_test_3.py

diff --git a/apps/deepstream-test3/deepstream_test_3.py b/apps/deepstream-test3/deepstream_test_3.py
index d81ec92..21d2f3b 100755
--- a/apps/deepstream-test3/deepstream_test_3.py
+++ b/apps/deepstream-test3/deepstream_test_3.py
@@ -36,6 +36,28 @@ from common.FPS import PERF_DATA
 
 import pyds
 
+from cffi import FFI
+
+ffi = FFI()
+
+clib = None
+
+ffi.cdef("""
+typedef struct
+{
+  uint32_t source_id;
+  uint32_t frame_num;
+  double comp_in_timestamp;
+  double latency;
+} NvDsFrameLatencyInfo;
+
+uint32_t nvds_measure_buffer_latency(void *buf, NvDsFrameLatencyInfo *latency_info);
+bool nvds_get_enable_latency_measurement();
+""")
+
+# Compile the C sources to produce the following .dll (or .so under *nix)
+clib = ffi.dlopen("/opt/nvidia/deepstream/deepstream/lib/libnvdsgst_meta.so")
+
 no_display = False
 silent = False
 file_loop = False
@@ -56,6 +78,27 @@ OSD_PROCESS_MODE= 0
 OSD_DISPLAY_TEXT= 1
 pgie_classes_str= ["Vehicle", "TwoWheeler", "Person","RoadSign"]
 
+batch_num = 0
+
+def osd_src_pad_buffer_probe(pad, info, u_data):
+    number_source = u_data
+    gst_buffer = info.get_buffer()
+    if not gst_buffer:
+        print("Unable to get GstBuffer ")
+        return
+    global batch_num
+    if clib.nvds_get_enable_latency_measurement:
+        print(f"************BATCH-NUM = {batch_num}**************")
+        c_gst_buf = ffi.cast("void *", hash(gst_buffer))
+        cNvDsFrameLatencyInfo = ffi.new(f"NvDsFrameLatencyInfo[{number_source}]")
+        sources = clib.nvds_measure_buffer_latency(c_gst_buf, cNvDsFrameLatencyInfo)
+        for i in range(sources):
+            print(f"Source id = {cNvDsFrameLatencyInfo[i].source_id} "
+                  f"Frame_num = {cNvDsFrameLatencyInfo[i].frame_num} "
+                  f"Frame latency = {cNvDsFrameLatencyInfo[i].latency} (ms) ")
+        batch_num += 1
+    return Gst.PadProbeReturn.OK
+
 # pgie_src_pad_buffer_probe  will extract metadata received on tiler sink pad
 # and update params for drawing rectangle, object information etc.
 def pgie_src_pad_buffer_probe(pad,info,u_data):
@@ -199,7 +242,7 @@ def create_source_bin(index,uri):
         return None
     return nbin
 
-def main(args, requested_pgie=None, config=None, disable_probe=False):
+def main(args, requested_pgie=None, config=None, disable_probe=True):
     global perf_data
     perf_data = PERF_DATA(len(args))
 
@@ -380,6 +423,12 @@ def main(args, requested_pgie=None, config=None, disable_probe=False):
             # perf callback function to print fps every 5 sec
             GLib.timeout_add(5000, perf_data.perf_print_callback)
 
+    osd_src_pad=nvosd.get_static_pad("src")
+    if not osd_src_pad:
+        sys.stderr.write(" Unable to get src pad \n")
+    else:
+        osd_src_pad.add_probe(Gst.PadProbeType.BUFFER, osd_src_pad_buffer_probe, number_sources)
+
     # List the sources
     print("Now playing...")
     for i, source in enumerate(args):

3.Execute the following command in the shell

export NVDS_ENABLE_COMPONENT_LATENCY_MEASUREMENT=1
export NVDS_ENABLE_LATENCY_MEASUREMENT=1

python3 deepstream_test_3.py --no-display -i rtsp://"yourtspuri0" uri1

You will see the following log

************BATCH-NUM = 504**************
Comp name = nvv4l2decoder1 in_system_timestamp = 1704882298795.132080 out_system_timestamp = 1704882298795.524902               component latency= 0.392822
Comp name = nvstreammux-Stream-muxer source_id = 1 pad_index = 1 frame_num = 504               in_system_timestamp = 1704882298795.561035 out_system_timestamp = 1704882298929.228027               component_latency = 133.666992
Comp name = nvv4l2decoder0 in_system_timestamp = 1704882298928.666016 out_system_timestamp = 1704882298929.083008               component latency= 0.416992
Comp name = nvstreammux-Stream-muxer source_id = 0 pad_index = 0 frame_num = 504               in_system_timestamp = 1704882298929.117920 out_system_timestamp = 1704882298929.229004               component_latency = 0.111084
Comp name = primary-inference in_system_timestamp = 1704882298929.272949 out_system_timestamp = 1704882298929.907959               component latency= 0.635010
Comp name = nvtiler in_system_timestamp = 1704882298930.197998 out_system_timestamp = 1704882298930.374023               component latency= 0.176025
Comp name = convertor in_system_timestamp = 1704882298930.561035 out_system_timestamp = 1704882298930.635010               component latency= 0.073975
Comp name = onscreendisplay in_system_timestamp = 1704882298930.710938 out_system_timestamp = 1704882298930.733887               component latency= 0.022949
Source id = 1 Frame_num = 504 Frame latency = 135.739013671875 (ms) 
Source id = 0 Frame_num = 504 Frame latency = 2.205078125 (ms)

noir201 · January 11, 2024, 8:36am

I’m using a .mp4 input file, does it have ti be rtsp? if so is there a way to find the same for .mp4 file input

junshengy · January 11, 2024, 8:41am

Please read the sample code first, just a valid uri

file:///xxxx/xxxx/*.mp4

noir201 · January 11, 2024, 5:32pm

Thank you very much it worked well on sample-3.

However, my sample contains a combination of sample 2 & sample 3 and when i added the same to the code it gives as the below no other details were given

and with sample-2 it will show core dumped, can you please advise which lines i have to change to apply the same to sample 2?
I only have one input and i’m not using nvosd for on screen display in the pipeline

noir201 · January 11, 2024, 5:59pm

regarding sample-3 i have one doubt that approximately each batch takes 350ms however the pipeline FPS is around 30FPS. If each frame takes around 350ms then we cant process 30 FPS

junshengy · January 12, 2024, 2:37am

For latency measurement, There is no difference between the two examples.

Since the latency measurement function is implemented in C language, this example uses cffi for python bindings.

Please google some usage examples of cffi.

Performance data depends on device, model, and network (when using rtsp/rtmp etc.).

So, you can run deepstream-app for benchmarking.

noir201 · January 14, 2024, 8:05pm

Thank you, i was able to make it work. What i meant is for example as below approximately each batch requires around 120ms however when using FPS measure it shows that pipeline is running at around 40 FPS. If it takes 120ms per frame then it should be less than 10FPS. Which of these two give approaches to measure time is more accurate?
BATCH-NUM = 86**
Comp name = nvv4l2decoder0 in_system_timestamp = 1705260664102.220947 out_system_timestamp = 1705260664127.583984 component latency= 25.363037
Comp name = nvstreammux-Stream-muxer source_id = 0 pad_index = 0 frame_num = 86 in_system_timestamp = 1705260664127.603027 out_system_timestamp = 1705260664163.211914 component_latency = 35.608887
Comp name = primary-inference face detector in_system_timestamp = 1705260664163.219971 out_system_timestamp = 1705260664169.677979 component latency= 6.458008
Comp name = secondary-inference face_classifier in_system_timestamp = 1705260664169.686035 out_system_timestamp = 1705260664179.729980 component latency= 10.043945
Source id = 0 Frame_num = 86 Frame latency = 84.154052734375 (ms)

**PERF: {‘stream0’: 63.98, ‘stream1’: 0.0}

junshengy · January 15, 2024, 2:34am

This is the average value, which is obtained by dividing the number of frames by time.

This is the delay of each frame

Decoding brings a large delay. Do you use network input such as rtsp/rtmp?

Also, have you set it to maxn mode?

noir201 · January 15, 2024, 7:35pm

i’m using a .mp4 file and yes it’s at MAXN. Is there any way you suggest to reduce decoding time?

Yes i understand but i think there’s a mismatch in both numbers specially both are measured for the same pipeline. Is there any reason that it shows high FPS although there’s a latency in the frame

junshengy · January 17, 2024, 6:23am

Because the elements is parallel, the latency of each frame will be greater than the fps value.

In addition, since the processing speed of different elements is inconsistent, frame latency will be affected.

Using FPS measure is more accurate, FPS reflects the real processing speed of the pipeline.

system · January 31, 2024, 6:24am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Regarding the frame delay measurement method DeepStream SDK python , deepstream	6	164	July 10, 2024
How to test the pipeline latency of deepstream test3? DeepStream SDK deepstream	9	21	April 15, 2025
Latency time at nvv4l2decoder about 250ms when running deepstream-app DeepStream SDK	4	646	October 12, 2021
Measuring DeepStream pipeline latency DeepStream SDK	9	2670	October 12, 2021
Could I get the time spent by each module in the deepstream-app or deepstream-parallel-inference-app runtime? DeepStream SDK	15	765	November 14, 2023
The most efficient method to evaluate time each plugin (in DeepStream)cost? DeepStream SDK	8	1309	October 12, 2021
Latency measurement in Deepstream pipeline DeepStream SDK deepstream	9	57	February 14, 2025
An understanding of the delay result produced by latency_measurement_buf_probe DeepStream SDK camera , cudnn , deepstream	40	67	December 24, 2024
Latency_measurement_buf_prob The results of the two latency tests differed greatly DeepStream SDK cudnn , deepstream	3	17	December 10, 2024
How to extract the timestamp information from the buffer to measure latency DeepStream SDK	18	634	February 21, 2024

Latency measure for every frame

Related topics