How to test the pipeline latency of deepstream test3?

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU):Jetson
• DeepStream Version:6.2

How to test the latency of elements in the pipeline of Deepstream Test3?

Please refer to this FAQ.

Please use local files to test the delay. rtsp/http may cause inaccurate testing due to network reasons.

@junshengy
I used the method you mentioned to measure the pipeline. I set up an architecture with 8 RTSP streams and a single model, and set the batched-push-timeout property of streammux to 40000 (40ms). The output log is as follows:

************BATCH-NUM = 60**************
Comp name = nvv4l2decoder4 in_system_timestamp = 1744601729450.699951 out_system_timestamp = 1744601729460.028076               component latency= 9.328125
Comp name = nvstreammux-stream-muxer source_id = 0 pad_index = 0 frame_num = 59               in_system_timestamp = 1744601729460.087891 out_system_timestamp = 1744601729617.936035               component_latency = 157.848145
Comp name = nvv4l2decoder0 in_system_timestamp = 1744601729450.467041 out_system_timestamp = 1744601729458.347900               component latency= 7.880859
Comp name = nvstreammux-stream-muxer source_id = 1 pad_index = 1 frame_num = 60               in_system_timestamp = 1744601729458.510010 out_system_timestamp = 1744601729617.937012               component_latency = 159.427002
Comp name = nvv4l2decoder5 in_system_timestamp = 1744601729450.969971 out_system_timestamp = 1744601729461.728027               component latency= 10.758057
Comp name = nvstreammux-stream-muxer source_id = 2 pad_index = 2 frame_num = 54               in_system_timestamp = 1744601729461.829102 out_system_timestamp = 1744601729617.937012               component_latency = 156.107910
Comp name = nvv4l2decoder6 in_system_timestamp = 1744601729450.303955 out_system_timestamp = 1744601729451.679932               component latency= 1.375977
Comp name = nvstreammux-stream-muxer source_id = 3 pad_index = 3 frame_num = 56               in_system_timestamp = 1744601729451.999023 out_system_timestamp = 1744601729617.937012               component_latency = 165.937988
Comp name = nvv4l2decoder2 in_system_timestamp = 1744601729450.290039 out_system_timestamp = 1744601729453.362061               component latency= 3.072021
Comp name = nvstreammux-stream-muxer source_id = 4 pad_index = 4 frame_num = 60               in_system_timestamp = 1744601729453.443115 out_system_timestamp = 1744601729617.937012               component_latency = 164.493896
Comp name = nvv4l2decoder7 in_system_timestamp = 1744601729450.186035 out_system_timestamp = 1744601729455.028076               component latency= 4.842041
Comp name = nvstreammux-stream-muxer source_id = 5 pad_index = 5 frame_num = 55               in_system_timestamp = 1744601729455.085938 out_system_timestamp = 1744601729617.937012               component_latency = 162.851074
Comp name = nvv4l2decoder3 in_system_timestamp = 1744601729451.633057 out_system_timestamp = 1744601729463.447998               component latency= 11.814941
Comp name = nvstreammux-stream-muxer source_id = 6 pad_index = 6 frame_num = 59               in_system_timestamp = 1744601729463.527100 out_system_timestamp = 1744601729617.937012               component_latency = 154.409912
Comp name = nvv4l2decoder1 in_system_timestamp = 1744601729450.252930 out_system_timestamp = 1744601729456.657959               component latency= 6.405029
Comp name = nvstreammux-stream-muxer source_id = 7 pad_index = 7 frame_num = 60               in_system_timestamp = 1744601729456.778076 out_system_timestamp = 1744601729617.937012               component_latency = 161.158936
Comp name = nvinfer0 in_system_timestamp = 1744601729618.676025 out_system_timestamp = 1744601729819.580078               component latency= 200.904053
Comp name = nvtiler in_system_timestamp = 1744601729819.641113 out_system_timestamp = 1744601729841.800049               component latency= 22.158936
Comp name = nvvideo-converter in_system_timestamp = 1744601729842.618896 out_system_timestamp = 1744601729845.227051               component latency= 2.608154
Comp name = nv-onscreendisplay in_system_timestamp = 1744601729845.322021 out_system_timestamp = 1744601729848.581055               component latency= 3.259033
Source id = 0 Frame_num = 59 Frame latency = 397.964111 (ms)
Source id = 1 Frame_num = 60 Frame latency = 398.197021 (ms)
Source id = 2 Frame_num = 54 Frame latency = 397.694092 (ms)
Source id = 3 Frame_num = 56 Frame latency = 398.360107 (ms)
Source id = 4 Frame_num = 60 Frame latency = 398.374023 (ms)
Source id = 5 Frame_num = 55 Frame latency = 398.478027 (ms)
Source id = 6 Frame_num = 59 Frame latency = 397.031006 (ms)
Source id = 7 Frame_num = 60 Frame latency = 398.411133 (ms)

Why did the batched-push-timeout attribute not work (streammux still waits for frame synchronization)?

    gst_bin_add(GST_BIN(pipeline), streammux);
    g_object_set(G_OBJECT(streammux), "batch-size", rtsp_number, NULL);
    g_object_set(G_OBJECT(streammux), "live-source", TRUE, NULL);
    g_object_set(G_OBJECT(streammux), "width", 1920, NULL);
    g_object_set(G_OBJECT(streammux), "height", 1080, NULL);
    g_object_set(G_OBJECT(streammux), "batched-push-timeout", 40000, NULL);

Have you set the sink’s sync property to false? Or you can use fakesink directly, so that the pipeline will run as quickly as possible.

@junshengy
Yes, my pipeline sink settings are as follows:

GstElement *sink = gst_element_factory_make("fakesink", "fake-sink");
g_object_set(G_OBJECT(sink), "sync", 0, NULL);

I don’t quite understand the operation mechanism of the entire pipeline. My engine model here is YOLOv5S, batch is dynamic, and I use trtexec to check the throughput of the engine: batch=8,Throughput=17.2461 qps, Does it mean that 17 sets of 8x3x640x640 data can be detected per second, and the waiting frame attribute in my Streammux is set to 40ms? Is there any impact between this? How can I solve it? By the way, my post-processing is fine (probably within 5ms)

No, you can use the following command line to test the performance, but the testing doesn’t include network/decode/form batch elapsed time. For DeepStream pipeline, It will be worse than this value.

/usr/src/tensorrt/bin/trtexec --loadEngine=xxxx.engine --iterations=100 --avgRuns=100
GPU Compute Time: min = 25.1663 ms, max = 27.2622 ms, mean = 25.9852 ms, median = 26.0652 ms, percentile(90%) = 26.6188 ms, percentile(95%) = 26.7308 ms, percentile(99%) = 26.8392 ms

For the property, refer to this FAQ.

This is why I recommend using local files to measure latency. Network delays can cause some inaccuracies in measurements.

@junshengy
But ultimately, I will use the RTSP stream for testing. I don’t think using local files for testing is meaningful because RTSP streams cannot be compared with files.

Deepstream cannot observe network latency. The above method is used to measure latency of nvv4l2decoder/nvstreammux/nvinfer/nvdsosd, etc. This is expected behavior. So what is your goal in measuring latency?

@junshengy
My purpose of measuring latency is to understand that my model has 17 qps, but it cannot detect 17 batch per second. I want to know where the problem lies in this part

17qps is just the output of trtexec without any other load. For deepstream pipeline, you need to tune the network/decoder/gpu usage to achieve the best performance.

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.