Latency measure for every frame

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) Jetson Agx orin
• DeepStream Version 6.2

I have developed a detection and recognition pipeline which accepts a video as an input. The model is built over sample-3 python where primary & secondary nvinfer are used. Is there a way to measure processing time in ms for the primary and secondary for every frame?

You can refer to this FAQ

Thank you for your answer.
Can you please clarify where the below lines can be added
export NVDS_ENABLE_COMPONENT_LATENCY_MEASUREMENT=1
export NVDS_ENABLE_LATENCY_MEASUREMENT=1

and can you please assist me using python coding samples of app-3

I have enabled both before running the python sample and it gave only Encode latency after every frame.
Frame Number=758 Object Count=3
Encode Latency = 15.592041
i didn’t get latency for other plugins or the full pipeline

1.pip install cffi

2.apply the following patch to deepstream_test_3.py

diff --git a/apps/deepstream-test3/deepstream_test_3.py b/apps/deepstream-test3/deepstream_test_3.py
index d81ec92..21d2f3b 100755
--- a/apps/deepstream-test3/deepstream_test_3.py
+++ b/apps/deepstream-test3/deepstream_test_3.py
@@ -36,6 +36,28 @@ from common.FPS import PERF_DATA
 
 import pyds
 
+from cffi import FFI
+
+ffi = FFI()
+
+clib = None
+
+ffi.cdef("""
+typedef struct
+{
+  uint32_t source_id;
+  uint32_t frame_num;
+  double comp_in_timestamp;
+  double latency;
+} NvDsFrameLatencyInfo;
+
+uint32_t nvds_measure_buffer_latency(void *buf, NvDsFrameLatencyInfo *latency_info);
+bool nvds_get_enable_latency_measurement();
+""")
+
+# Compile the C sources to produce the following .dll (or .so under *nix)
+clib = ffi.dlopen("/opt/nvidia/deepstream/deepstream/lib/libnvdsgst_meta.so")
+
 no_display = False
 silent = False
 file_loop = False
@@ -56,6 +78,27 @@ OSD_PROCESS_MODE= 0
 OSD_DISPLAY_TEXT= 1
 pgie_classes_str= ["Vehicle", "TwoWheeler", "Person","RoadSign"]
 
+batch_num = 0
+
+def osd_src_pad_buffer_probe(pad, info, u_data):
+    number_source = u_data
+    gst_buffer = info.get_buffer()
+    if not gst_buffer:
+        print("Unable to get GstBuffer ")
+        return
+    global batch_num
+    if clib.nvds_get_enable_latency_measurement:
+        print(f"************BATCH-NUM = {batch_num}**************")
+        c_gst_buf = ffi.cast("void *", hash(gst_buffer))
+        cNvDsFrameLatencyInfo = ffi.new(f"NvDsFrameLatencyInfo[{number_source}]")
+        sources = clib.nvds_measure_buffer_latency(c_gst_buf, cNvDsFrameLatencyInfo)
+        for i in range(sources):
+            print(f"Source id = {cNvDsFrameLatencyInfo[i].source_id} "
+                  f"Frame_num = {cNvDsFrameLatencyInfo[i].frame_num} "
+                  f"Frame latency = {cNvDsFrameLatencyInfo[i].latency} (ms) ")
+        batch_num += 1
+    return Gst.PadProbeReturn.OK
+
 # pgie_src_pad_buffer_probe  will extract metadata received on tiler sink pad
 # and update params for drawing rectangle, object information etc.
 def pgie_src_pad_buffer_probe(pad,info,u_data):
@@ -199,7 +242,7 @@ def create_source_bin(index,uri):
         return None
     return nbin
 
-def main(args, requested_pgie=None, config=None, disable_probe=False):
+def main(args, requested_pgie=None, config=None, disable_probe=True):
     global perf_data
     perf_data = PERF_DATA(len(args))
 
@@ -380,6 +423,12 @@ def main(args, requested_pgie=None, config=None, disable_probe=False):
             # perf callback function to print fps every 5 sec
             GLib.timeout_add(5000, perf_data.perf_print_callback)
 
+    osd_src_pad=nvosd.get_static_pad("src")
+    if not osd_src_pad:
+        sys.stderr.write(" Unable to get src pad \n")
+    else:
+        osd_src_pad.add_probe(Gst.PadProbeType.BUFFER, osd_src_pad_buffer_probe, number_sources)
+
     # List the sources
     print("Now playing...")
     for i, source in enumerate(args):

3.Execute the following command in the shell

export NVDS_ENABLE_COMPONENT_LATENCY_MEASUREMENT=1
export NVDS_ENABLE_LATENCY_MEASUREMENT=1

python3 deepstream_test_3.py --no-display -i rtsp://"yourtspuri0" uri1

You will see the following log

************BATCH-NUM = 504**************
Comp name = nvv4l2decoder1 in_system_timestamp = 1704882298795.132080 out_system_timestamp = 1704882298795.524902               component latency= 0.392822
Comp name = nvstreammux-Stream-muxer source_id = 1 pad_index = 1 frame_num = 504               in_system_timestamp = 1704882298795.561035 out_system_timestamp = 1704882298929.228027               component_latency = 133.666992
Comp name = nvv4l2decoder0 in_system_timestamp = 1704882298928.666016 out_system_timestamp = 1704882298929.083008               component latency= 0.416992
Comp name = nvstreammux-Stream-muxer source_id = 0 pad_index = 0 frame_num = 504               in_system_timestamp = 1704882298929.117920 out_system_timestamp = 1704882298929.229004               component_latency = 0.111084
Comp name = primary-inference in_system_timestamp = 1704882298929.272949 out_system_timestamp = 1704882298929.907959               component latency= 0.635010
Comp name = nvtiler in_system_timestamp = 1704882298930.197998 out_system_timestamp = 1704882298930.374023               component latency= 0.176025
Comp name = convertor in_system_timestamp = 1704882298930.561035 out_system_timestamp = 1704882298930.635010               component latency= 0.073975
Comp name = onscreendisplay in_system_timestamp = 1704882298930.710938 out_system_timestamp = 1704882298930.733887               component latency= 0.022949
Source id = 1 Frame_num = 504 Frame latency = 135.739013671875 (ms) 
Source id = 0 Frame_num = 504 Frame latency = 2.205078125 (ms) 

I’m using a .mp4 input file, does it have ti be rtsp? if so is there a way to find the same for .mp4 file input

Please read the sample code first, just a valid uri

file:///xxxx/xxxx/*.mp4

Thank you very much it worked well on sample-3.

However, my sample contains a combination of sample 2 & sample 3 and when i added the same to the code it gives as the below no other details were given

and with sample-2 it will show core dumped, can you please advise which lines i have to change to apply the same to sample 2?
I only have one input and i’m not using nvosd for on screen display in the pipeline

regarding sample-3 i have one doubt that approximately each batch takes 350ms however the pipeline FPS is around 30FPS. If each frame takes around 350ms then we cant process 30 FPS

For latency measurement, There is no difference between the two examples.

Since the latency measurement function is implemented in C language, this example uses cffi for python bindings.

Please google some usage examples of cffi.

Performance data depends on device, model, and network (when using rtsp/rtmp etc.).

So, you can run deepstream-app for benchmarking.

Thank you, i was able to make it work. What i meant is for example as below approximately each batch requires around 120ms however when using FPS measure it shows that pipeline is running at around 40 FPS. If it takes 120ms per frame then it should be less than 10FPS. Which of these two give approaches to measure time is more accurate?
BATCH-NUM = 86**
Comp name = nvv4l2decoder0 in_system_timestamp = 1705260664102.220947 out_system_timestamp = 1705260664127.583984 component latency= 25.363037
Comp name = nvstreammux-Stream-muxer source_id = 0 pad_index = 0 frame_num = 86 in_system_timestamp = 1705260664127.603027 out_system_timestamp = 1705260664163.211914 component_latency = 35.608887
Comp name = primary-inference face detector in_system_timestamp = 1705260664163.219971 out_system_timestamp = 1705260664169.677979 component latency= 6.458008
Comp name = secondary-inference face_classifier in_system_timestamp = 1705260664169.686035 out_system_timestamp = 1705260664179.729980 component latency= 10.043945
Source id = 0 Frame_num = 86 Frame latency = 84.154052734375 (ms)

**PERF: {‘stream0’: 63.98, ‘stream1’: 0.0}

This is the average value, which is obtained by dividing the number of frames by time.

This is the delay of each frame

Decoding brings a large delay. Do you use network input such as rtsp/rtmp?

Also, have you set it to maxn mode?

i’m using a .mp4 file and yes it’s at MAXN. Is there any way you suggest to reduce decoding time?

Yes i understand but i think there’s a mismatch in both numbers specially both are measured for the same pipeline. Is there any reason that it shows high FPS although there’s a latency in the frame

Because the elements is parallel, the latency of each frame will be greater than the fps value.

In addition, since the processing speed of different elements is inconsistent, frame latency will be affected.

Using FPS measure is more accurate, FPS reflects the real processing speed of the pipeline.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.