Interval has no impact on the execution time

• Hardware Platform (Jetson / GPU) NVIDIA GeForce RTX 3090
• DeepStream Version 6.3
• JetPack Version (valid for Jetson only)
• TensorRT Version 8.4.0
• NVIDIA GPU Driver Version (valid for GPU only) 535.113.01
• Issue Type(questions, new requirements, bugs) questions

Hello,

I am using a deepstream pipeline in Python. My pipeline consists of detector (pgie) → tracker → classifier1 (sgie) → classifier2 (sgie2).

It uses a triton server for inferring the GIEs. So, I set the interval property in the config file of each GIE. I noticed that whenever I change the value of the interval, the execution time never changes.

This is how I calculate the execution time:

    print("Now playing...")
    for i, source in enumerate(args):
        print(i, ": ", source)

    print("Starting pipeline \n")
    pipeline.set_state(Gst.State.PLAYING)
    try:
        time_st = time.time()
        loop.run()
        time_end = time.time()
        print("frame time = ",time_end-time_st)
    except:
        pass
    # cleanup
    print("Exiting app\n")
    pipeline.set_state(Gst.State.NULL)

I used bInferDone attribute to check if the interval works correctly. And yes! If I set the interval property to 5, I get a true return from bInferDone attribute every 5 frames.

  • bInferDoneint, Boolean indicating whether inference is performed on given frame.

Any explanation?

Thanks.

Hi @JoeShz

  1. That’s not an accurate way of measuring performance. You are measuring the running time of the whole loop instead of measuring the interlatency between the pgie sink pad and the sgie2 src pad, which will give you an accurate measurement of how much time it takes for a buffer to pass through all your inference elements.

  2. The process interval won’t affect performance if something else is the bottleneck. For example: memory copies, encoding, decoding, conversions…

Hi @miguel.taylor

Would you tell me in detail how I can measure the inter-latency between the pipeline components?

That would be so nice of you.

Sure:

  1. First, isolate the pipeline from the application so that it can be debugged using gst-launch-1.0. This step is not mandatory, but it greatly simplifies the debugging process.

  2. To obtain the pipeline, you can either generate a dot file and reconstruct the pipeline from it or manually construct the pipeline by adding the same elements that you are including in your application. Here is an example DeepStream pipeline that you can adapt for your specific case.

gst-launch-1.0 \
uridecodebin3 uri=file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h264.mp4 ! queue ! \
nvstreammux0.sink_0 nvstreammux name=nvstreammux0 batch-size=1 batched-push-timeout=40000 width=1920 height=1080 live-source=TRUE ! queue ! \
nvvideoconvert ! queue ! \
nvinfer name=nvinfer1 config-file-path="/opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-test1/dstest1_pgie_config.txt" ! queue ! \
nvtracker tracker-width=240 tracker-height=200 ll-lib-file=libnvds_mot_iou.so ll-config-file=iou_config.txt ! queue ! \
nvinfer name=nvinfer2 process-mode=secondary infer-on-gie-id=1 infer-on-class-ids="0:" batch-size=16 config-file-path="/opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-test2/dstest2_sgie1_config.txt" ! queue ! \
nvinfer name=nvinfer3 process-mode=secondary infer-on-gie-id=1 infer-on-class-ids="0:" batch-size=16 config-file-path="/opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-test2/dstest2_sgie2_config.txt" ! queue ! \
nvinfer name=nvinfer4 process-mode=secondary infer-on-gie-id=1 infer-on-class-ids="0:" batch-size=16 config-file-path="/opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-test2/dstest2_sgie3_config.txt" ! queue ! \
fakesink name=fakesink0 sync=false
  1. The newer GStreamer (1.20+) already includes the interlatency tracer as part of its default tracers. However, for older versions, you will need to install a custom tracer to measure interlatency. We have an open-source project called GstShark that installs a lot of additional tracers, including interlatency. Here are the instructions to get GstShark on your board:
sudo apt install -y libgraphviz-dev
git clone git@github.com:RidgeRun/gst-shark.git
cd gst-shark/
meson build --prefix /usr/
ninja -C build
sudo ninja -C build install
  1. Run the pipeline with the interlatency tracer enabled:
GST_DEBUG=*TRACE*:9 GST_TRACERS="interlatency" gst-launch-1.0 ...
  1. You will get logs with the latency on different point of the pipeline
0:05:16.383276965 91783 0xaaaaefd91aa0 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)source_src, to_pad=(string)fakesink0_sink, time=(string)0:00:00.185861971;

The pipeline from above has an interlatency of 185ms from start to finish

You can achieve the same result if you probe your buffers on the uridecodebin source pad and the fakesink sink pad and then print their presentation timestamps (PTS) to compare them. However, this method is slightly less accurate for measurement because it doesn’t take into consideration the elements inside bins, and the probes will be run by independent threads, causing the measurement to vary between runs depending on the thread scheduling.

Hello @miguel.taylor

Thank you for your explanation!

But, I can’t clone GstShark, it keep saying:

fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.

Is it still available?

Hi

Sorry, I shared the command to clone it with SSH. Try this one instead:

git clone https://github.com/RidgeRun/gst-shark.git

Thank you for your fast response @miguel.taylor

I am running these commands inside deepstream 6.3 docker container.

sudo apt install -y libgraphviz-dev
git clone https://github.com/RidgeRun/gst-shark.git
cd gst-shark/
meson build --prefix /usr/
ninja -C build
sudo ninja -C build install

But, I get this error:

The Meson build system
Version: 0.53.2
Source dir: /opt/nvidia/deepstream/deepstream-6.3/sources/deepstream_python_apps/apps/gst-shark
Build dir: /opt/nvidia/deepstream/deepstream-6.3/sources/deepstream_python_apps/apps/gst-shark/build
Build type: native build

meson.build:27:27: ERROR: Expecting eof got string.
gst_shark_version_short = f'@gst_shark_version_major@.@gst_shark_version_minor@.@gst_shark_version_micro@'
                           ^

A full log can be found at /opt/nvidia/deepstream/deepstream-6.3/sources/deepstream_python_apps/apps/gst-shark/build/meson-logs/meson-log.txt

Any idea how can I make it work?

That’s a compatibility issue with your meson version. Try installing meson from pip:

sudo apt install python3-pip ninja-build meson
pip3 install meson

Hi @miguel.taylor

Okay, I could get it to work finally!!

But, I don’t understand these logs xD
There are a lot of them and they are all from_pad=(string)source_src to different pads. Should I consider the last one only?

0:00:22.472242527 35920 0x55dbac8e12a0 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)source_src, to_pad=(string)fakesink0_sink, time=(string)0:00:00.041825287;

If this is the time taken from source_src to fakesink0_sink, then why am I getting this Execution ended after 0:00:12.361617439 in the end?

When I remove GST_DEBUG=*TRACE*:9 GST_TRACERS="interlatency" I get a smaller number in the execution time. E.g Execution ended after 0:00:02.375646066 Is this because of the printings?

Random sample of logs:

0:00:22.471484486 35920 0x55dbac8e1460 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)source_src, to_pad=(string)queue4_src, time=(string)0:00:00.041073396;
0:00:22.471567547 35920 0x55dbac8e1460 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)source_src, to_pad=(string)nvinfer2_src, time=(string)0:00:00.041163057;
0:00:22.471605718 35920 0x55dbac8e1580 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)source_src, to_pad=(string)nvtracker0_src, time=(string)0:00:00.041197128;
0:00:22.471606318 35920 0x55dbac8e1400 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)source_src, to_pad=(string)queue5_src, time=(string)0:00:00.041193398;
0:00:22.471695159 35920 0x55dbac8e1460 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)source_src, to_pad=(string)queue4_src, time=(string)0:00:00.041286449;
0:00:22.471735210 35920 0x55dbac8e1460 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)source_src, to_pad=(string)nvinfer2_src, time=(string)0:00:00.041331490;
0:00:22.471744890 35920 0x55dbac8e1400 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)source_src, to_pad=(string)nvinfer3_src, time=(string)0:00:00.041335720;
0:00:22.471824301 35920 0x55dbac8e1400 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)source_src, to_pad=(string)queue5_src, time=(string)0:00:00.041416961;
0:00:22.471835131 35920 0x55dbac8e11e0 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)source_src, to_pad=(string)queue6_src, time=(string)0:00:00.041424651;
0:00:22.471973603 35920 0x55dbac8e11e0 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)source_src, to_pad=(string)nvinfer4_src, time=(string)0:00:00.041568893;
0:00:22.471963913 35920 0x55dbac8e1400 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)source_src, to_pad=(string)nvinfer3_src, time=(string)0:00:00.041489522;
0:00:22.472013634 35920 0x55dbac8e12a0 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)source_src, to_pad=(string)queue7_src, time=(string)0:00:00.041601183;
0:00:22.472015914 35920 0x55dbac8e11e0 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)source_src, to_pad=(string)queue6_src, time=(string)0:00:00.041611854;
0:00:22.472092595 35920 0x55dbac8e11e0 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)source_src, to_pad=(string)nvinfer4_src, time=(string)0:00:00.041689195;
0:00:22.472128835 35920 0x55dbac8e12a0 TRACE             GST_TRACER :0:: interlatency, from_pad=(string)source_src, to_pad=(string)fakesink0_sink, time=(string)0:00:00.041601183;

Should I consider the last one only?

The pads are structured as <element-name>_<pad-name>. Each GStreamer filter type element has a sink/input pad and a src/output pad. Some elements can have more or fewer pads. Ideally, you would take the pads from the elements you want to measure latency from: The src pad from the initial element, and the sink pad from the final element. Take a couple of samples (each sample represents a frame) and average the results to get an accurate measurement.

In your case I would take the samples between source_src and fakesink_sink.

then why am I getting this Execution ended after 0:00:12.361617439 in the end?

That is the runtime of the whole pipeline

I get a smaller number in the execution time. E.g Execution ended after 0:00:02.375646066 Is this because of the printings?

Yes, as with any logging or profiling, it will have an impact of execution time.

1 Like

Okay, got it!

I have one more small question, now interval specifies the number of consecutive, batches to be skipped for inference. Is there anything can I use to skip frames? I am not interested in all frames, I just want to process the pipeline on f1, f6, f11, …, fn.
For simplicity, If I have a video of 100 frames, I want the resultant video/frames to be 100/5 or something.

Thank you so much @miguel.taylor
Your support is very useful.

You can add videorate to your pipeline to reduce the framerate by dropping frames:

uridecodebin3 uri=... ! videorate drop-only=true max-rate=10 ! ...

However, the ideal solution would be to reduce the framerate on the producer side of the stream if you can, to avoid decoding the additional frames.

You can also drop frames with a leaky queue. This element will only drop frames if the maximum value is exceeded because a downstream element is taking too long to process buffers.

... ! queue leaky=2 max-size-buffers=10 ! ...

Thank you so much for your support @miguel.taylor ^^

It was really helpful chatting with you.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.