Remove gstreamer pipeline buffering

Hello,

For the past few days I’ve been trying to run the deepstream-segmentation-test on my jetson nano, running a Fast-SCNN model with a pi v3 camera wide(imx708).
I managed to run it however the latency was too big and started experimenting with gst-launch-1.0.
The problem is that the model performs inference on what appears to be past frames(maybe there is buffering somewhere) so the visualization is delayed by 4-5 seconds.
In python with onnxruntime and opencv if i use appsink drop=true sink=false the inference is performed on the latest received frame and the delay is not there anymore, or at least its much smaller.

How could i achieve the same with gst-launch so that then i can start implementing this in c++ deepstream, how can i drop the past frames?

This buffering is present again when i run my yolov3_tiny model with deepstream-app and i notice a delay even though the performance is 30fps+

Here are the pipelines:
gst-launch-1.0 nvarguscamerasrc bufapi-version=true ! “video/x-raw(memory:NVMM),framerate=56/1” ! m.sink_0 nvstreammux name=m batch-size=1 width=1024 height=512 live-source=1 ! nvinfer config-file-path= /opt/nvidia/deepstream/deepstream-6.0/sources/apps/sample_apps/deepstream-segmentation-test/dstest_segmentation_config_semantic.txt ! nvsegvisual ! nvvideoconvert ! nv3dsink sync=0
And in python:
cap = cv2.VideoCapture(‘nvarguscamerasrc ! video/x-raw(memory:NVMM), width=(int)1024, height=(int)512,format=(string)NV12, framerate=(fraction)56/1 ! nvvidconv ! video/x-raw, format=(string)BGRx ! videoconvert ! appsink drop=true sync=false’, cv2.CAP_GSTREAMER)

I’m new to deepstream and gstreamer so any help would be appreciated, thanks !!

Moving to DeepStream forum for better support.

What do you mean by this? The DeepStream pipeline is an inferencing pipeline, the so-called opencv pipeline is a GStreamer pipeline for camera input only. Why do you compare these two pipelines?

I was wondering how to drop the old buffers(just like it happens with the gstreamer pipeline in opencv) in my deepstream pipeline, thats why i was comparing the two.

From my understanding the appsink has a queue buffer, and by specifying drop=true, when the queue is filled the older frames are dropped for the newer ones.
In deepstream since there is such a big delay i assume there is a buffer which forces the model to infer on all images captured by the camera, but it should only use the most recent frame and skip the old frames.

Maybe queue (gstreamer.freedesktop.org) can help you to implement similiar function in your pipeline.

I have tried using that with the leaky functionality but nothing changed, am i not using it correctly?
I tried putting the queue element in a few spots in the pipeline like this:
1)
gst-launch-1.0 nvarguscamerasrc bufapi-version=true ! “video/x-raw(memory:NVMM),framerate=56/1” ! queue leaky=2 max-size-buffers=1 ! m.sink_0 nvstreammux name=m batch-size=1 width=1024 height=512 live-source=1 ! nvinfer config-file-path= /opt/nvidia/deepstream/deepstream-6.0/sources/apps/sample_apps/deepstream-segmentation-test/dstest_segmentation_config_semantic.txt ! nvsegvisual ! nvvideoconvert ! nv3dsink sync=0
2)
gst-launch-1.0 nvarguscamerasrc bufapi-version=true ! “video/x-raw(memory:NVMM),framerate=56/1” ! m.sink_0 nvstreammux name=m batch-size=1 width=1024 height=512 live-source=1 ! queue leaky=2 max-size-buffers=1 ! nvinfer config-file-path= /opt/nvidia/deepstream/deepstream-6.0/sources/apps/sample_apps/deepstream-segmentation-test/dstest_segmentation_config_semantic.txt ! nvsegvisual ! nvvideoconvert ! nv3dsink sync=0

First, do you run the “gst-launch” pipeline with root? The model engine build will take a lot time. You need to run with root for the first time to generate the engine file. And then please add the following configuration in the [property] group in /opt/nvidia/deepstream/deepstream-6.0/sources/apps/sample_apps/deepstream-segmentation-test/dstest_segmentation_config_semantic.txt.
model-engine-file=/opt/nvidia/deepstream/deepstream/samples/models/Segmentation/semantic/unetres18_v4_pruned0.65_800_data.uff_b2_gpu0_fp32.engine

And then the model engine build will never happen when you run the pipeline again. It will take much less time to start processing the first frame.

I have already set the engine file that i built with trtexec, the build doesnt happen, the delay is not present only on the first frames the whole pipeline is delayed.
Here are some videos:
The gst-launch video is of a yolov3 tiny running at around 50 fps


A video with drop=false sync=false, the model is running with onnxruntime at 20 fps

A video with drop=true sync=false, the model is running with onnxruntime at 20fps

As you can see with drop=false i can recreate the behaviour of the gstreamer pipeline, you can see the delay, what i want is to rid the gstreamer pipeline of the delay so that it looks like in the video where drop=true.

Sorry for the different resolutions i did it in a hurry, i hope it’s more clear what i want to achieve.

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
• DeepStream Version
• JetPack Version (valid for Jetson only)
• TensorRT Version
• NVIDIA GPU Driver Version (valid for GPU only)

• Hardware Platform (Jetson / GPU) Jetson Nano
• DeepStream Version Deepstream 6.0
• JetPack Version (valid for Jetson only) 4.6.4
• TensorRT Version 8.2.1.8
• NVIDIA GPU Driver Version (valid for GPU only)

Please measure the segmentation model performance by the trtexec tool first.

Here is the output of trtexec:
best_model_12sep_dump.txt (6.6 KB)

[09/13/2023-19:03:42] [I] Total GPU Compute Time: 1.61822 s

The model is too heavy for Jetson Nano. It is no use to adjust “queue”.

How is it too heavy, with my yolov3 tiny model, if i run trtexec the Total GPU Compute Time is 3 seconds(yolov3tiny_dump.txt (9.2 KB)) and that model is known to perform well on the jetson nano, i am convinced this is not a performance issue, it’s just the pipeline that is faulty, i can’t figure out how to drop the old frames, the model processes the live feed as a video, it processes all the frames in sequence, it shouldn’t do that it should only process the newest available frame and drop all the other ones, i’m really frustrated how this issue is not addressed more, the videos that i posted perfectly demonstrates that(and it’s on a really good performing model, and you can see the delay).
I was expecting this to be a very easy fix, i don’t see any other posts like this and i was assuming because it’s an easy fix, is no one using deepstream for real time applications?

From your logs:

Yolov8
GPU Compute Time: min = 44.1274 ms, max = 82.9796 ms, mean = 46.2349 ms, median = 44.2473 ms, percentile(99%) = 82.9796 ms

Yolov3-tiny
GPU Compute Time: min = 14.4077 ms, max = 20.1235 ms, mean = 14.6525 ms, median = 14.5801 ms, percentile(99%) = 17.8984 ms

I mamaged to get my model to work on the nano, unfortunatelly i couldn’t get deepstream to work, for anyone wondering, i used jetson inference by dustynv and wrote my own class and now the model is running at around 100 ms or 10 frames per second(512x1024 input size) without visualization, that annoying delay is no longer.