How to test the pipeline latency of deepstream test3?

LFYTMLY · April 13, 2025, 12:21pm

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)：Jetson
• DeepStream Version：6.2

How to test the latency of elements in the pipeline of Deepstream Test3?

junshengy · April 14, 2025, 2:15am

Please refer to this FAQ.

Please use local files to test the delay. rtsp/http may cause inaccurate testing due to network reasons.

LFYTMLY · April 14, 2025, 6:23am

@junshengy
I used the method you mentioned to measure the pipeline. I set up an architecture with 8 RTSP streams and a single model, and set the batched-push-timeout property of streammux to 40000 (40ms). The output log is as follows：

************BATCH-NUM = 60**************
Comp name = nvv4l2decoder4 in_system_timestamp = 1744601729450.699951 out_system_timestamp = 1744601729460.028076               component latency= 9.328125
Comp name = nvstreammux-stream-muxer source_id = 0 pad_index = 0 frame_num = 59               in_system_timestamp = 1744601729460.087891 out_system_timestamp = 1744601729617.936035               component_latency = 157.848145
Comp name = nvv4l2decoder0 in_system_timestamp = 1744601729450.467041 out_system_timestamp = 1744601729458.347900               component latency= 7.880859
Comp name = nvstreammux-stream-muxer source_id = 1 pad_index = 1 frame_num = 60               in_system_timestamp = 1744601729458.510010 out_system_timestamp = 1744601729617.937012               component_latency = 159.427002
Comp name = nvv4l2decoder5 in_system_timestamp = 1744601729450.969971 out_system_timestamp = 1744601729461.728027               component latency= 10.758057
Comp name = nvstreammux-stream-muxer source_id = 2 pad_index = 2 frame_num = 54               in_system_timestamp = 1744601729461.829102 out_system_timestamp = 1744601729617.937012               component_latency = 156.107910
Comp name = nvv4l2decoder6 in_system_timestamp = 1744601729450.303955 out_system_timestamp = 1744601729451.679932               component latency= 1.375977
Comp name = nvstreammux-stream-muxer source_id = 3 pad_index = 3 frame_num = 56               in_system_timestamp = 1744601729451.999023 out_system_timestamp = 1744601729617.937012               component_latency = 165.937988
Comp name = nvv4l2decoder2 in_system_timestamp = 1744601729450.290039 out_system_timestamp = 1744601729453.362061               component latency= 3.072021
Comp name = nvstreammux-stream-muxer source_id = 4 pad_index = 4 frame_num = 60               in_system_timestamp = 1744601729453.443115 out_system_timestamp = 1744601729617.937012               component_latency = 164.493896
Comp name = nvv4l2decoder7 in_system_timestamp = 1744601729450.186035 out_system_timestamp = 1744601729455.028076               component latency= 4.842041
Comp name = nvstreammux-stream-muxer source_id = 5 pad_index = 5 frame_num = 55               in_system_timestamp = 1744601729455.085938 out_system_timestamp = 1744601729617.937012               component_latency = 162.851074
Comp name = nvv4l2decoder3 in_system_timestamp = 1744601729451.633057 out_system_timestamp = 1744601729463.447998               component latency= 11.814941
Comp name = nvstreammux-stream-muxer source_id = 6 pad_index = 6 frame_num = 59               in_system_timestamp = 1744601729463.527100 out_system_timestamp = 1744601729617.937012               component_latency = 154.409912
Comp name = nvv4l2decoder1 in_system_timestamp = 1744601729450.252930 out_system_timestamp = 1744601729456.657959               component latency= 6.405029
Comp name = nvstreammux-stream-muxer source_id = 7 pad_index = 7 frame_num = 60               in_system_timestamp = 1744601729456.778076 out_system_timestamp = 1744601729617.937012               component_latency = 161.158936
Comp name = nvinfer0 in_system_timestamp = 1744601729618.676025 out_system_timestamp = 1744601729819.580078               component latency= 200.904053
Comp name = nvtiler in_system_timestamp = 1744601729819.641113 out_system_timestamp = 1744601729841.800049               component latency= 22.158936
Comp name = nvvideo-converter in_system_timestamp = 1744601729842.618896 out_system_timestamp = 1744601729845.227051               component latency= 2.608154
Comp name = nv-onscreendisplay in_system_timestamp = 1744601729845.322021 out_system_timestamp = 1744601729848.581055               component latency= 3.259033
Source id = 0 Frame_num = 59 Frame latency = 397.964111 (ms)
Source id = 1 Frame_num = 60 Frame latency = 398.197021 (ms)
Source id = 2 Frame_num = 54 Frame latency = 397.694092 (ms)
Source id = 3 Frame_num = 56 Frame latency = 398.360107 (ms)
Source id = 4 Frame_num = 60 Frame latency = 398.374023 (ms)
Source id = 5 Frame_num = 55 Frame latency = 398.478027 (ms)
Source id = 6 Frame_num = 59 Frame latency = 397.031006 (ms)
Source id = 7 Frame_num = 60 Frame latency = 398.411133 (ms)

Why did the batched-push-timeout attribute not work (streammux still waits for frame synchronization)？

    gst_bin_add(GST_BIN(pipeline), streammux);
    g_object_set(G_OBJECT(streammux), "batch-size", rtsp_number, NULL);
    g_object_set(G_OBJECT(streammux), "live-source", TRUE, NULL);
    g_object_set(G_OBJECT(streammux), "width", 1920, NULL);
    g_object_set(G_OBJECT(streammux), "height", 1080, NULL);
    g_object_set(G_OBJECT(streammux), "batched-push-timeout", 40000, NULL);

junshengy · April 14, 2025, 7:14am

Have you set the sink’s sync property to false? Or you can use fakesink directly, so that the pipeline will run as quickly as possible.

LFYTMLY · April 14, 2025, 7:40am

@junshengy
Yes, my pipeline sink settings are as follows:

GstElement *sink = gst_element_factory_make("fakesink", "fake-sink");
g_object_set(G_OBJECT(sink), "sync", 0, NULL);

I don’t quite understand the operation mechanism of the entire pipeline. My engine model here is YOLOv5S, batch is dynamic, and I use trtexec to check the throughput of the engine: batch=8，Throughput=17.2461 qps， Does it mean that 17 sets of 8x3x640x640 data can be detected per second, and the waiting frame attribute in my Streammux is set to 40ms? Is there any impact between this? How can I solve it? By the way, my post-processing is fine (probably within 5ms)

junshengy · April 14, 2025, 8:00am

No, you can use the following command line to test the performance, but the testing doesn’t include network/decode/form batch elapsed time. For DeepStream pipeline, It will be worse than this value.

/usr/src/tensorrt/bin/trtexec --loadEngine=xxxx.engine --iterations=100 --avgRuns=100

GPU Compute Time: min = 25.1663 ms, max = 27.2622 ms, mean = 25.9852 ms, median = 26.0652 ms, percentile(90%) = 26.6188 ms, percentile(95%) = 26.7308 ms, percentile(99%) = 26.8392 ms

For the property, refer to this FAQ.

This is why I recommend using local files to measure latency. Network delays can cause some inaccuracies in measurements.

LFYTMLY · April 14, 2025, 8:25am

@junshengy
But ultimately, I will use the RTSP stream for testing. I don’t think using local files for testing is meaningful because RTSP streams cannot be compared with files.

junshengy · April 14, 2025, 8:29am

Deepstream cannot observe network latency. The above method is used to measure latency of nvv4l2decoder/nvstreammux/nvinfer/nvdsosd, etc. This is expected behavior. So what is your goal in measuring latency?

LFYTMLY · April 14, 2025, 9:00am

@junshengy
My purpose of measuring latency is to understand that my model has 17 qps, but it cannot detect 17 batch per second. I want to know where the problem lies in this part

junshengy · April 15, 2025, 6:21am

17qps is just the output of trtexec without any other load. For deepstream pipeline, you need to tune the network/decoder/gpu usage to achieve the best performance.

yingliu · May 23, 2025, 8:29am

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

system · June 6, 2025, 8:30am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Latency_measurement_buf_prob The results of the two latency tests differed greatly DeepStream SDK cudnn , deepstream	3	34	December 10, 2024
High latency in Deepstream pipeline DeepStream SDK nvbugs , deepstream	6	153	February 26, 2025
Latency measure for every frame DeepStream SDK	14	1039	January 17, 2024
An understanding of the delay result produced by latency_measurement_buf_probe DeepStream SDK camera , cudnn , deepstream	40	162	December 24, 2024
Latency measurement issue DeepStream SDK	4	1257	October 12, 2021
Measuring DeepStream pipeline latency DeepStream SDK	9	2716	October 12, 2021
Analyzing latency using Nsight systems in deepstream DeepStream SDK jetson , deepstream	30	175	April 14, 2025
DeepStream deepstream with rtsp stream, 3s latency DeepStream SDK	9	2485	August 3, 2021
Why my pipeline is stuck and delayed, but deepstream-app is very smooth? DeepStream SDK	6	855	October 28, 2022
Latency measurement (nvds_measure_buffer_latency) gave weird results DeepStream SDK	2	1511	October 12, 2021

How to test the pipeline latency of deepstream test3?

Related topics