How to decrease the latency in TX2 when decoding a RTSP stream

Here is the gstreamer order I’ve used in OpenCV to decode a RTSP stream.

"rtspsrc location=rtsp://10.10.10.10:5445 latency=100 ! queue ! rtph264depay ! h264parse ! omxh264dec ! nvvidconv ! video/x-raw, width=(int)1920, height=(int)1080,format=(string)BGRx ! queue ! videoconvert ! queue ! appsink sync=false"

Are there other methods to decrease the latency? My current latency is about 500 ms. The target is about 100 ms.

Thanks in advance.

Hi kealennieh,

I have 3 suggestions that could help you reduce latency:

  1. Avoid the videoconvert by supporting RGBx on your application with C++ OpenCV you can do an RGBA matrix and ignore the alpha:
    // Create mat with alpha channel
        Mat mat(480, 640, CV_8UC4);
    
  2. You can set your appsink to drop buffers. This will decrease latency at the cost of dropping buffers:
    'appsink max-buffers=1 drop=True'
    
  3. You can insert queues to keep the upstream elements from buffering too:
    'queue max-size-buffers=1 leaky=downstream'
    
1 Like

Thanks for your quick reply !

The first two suggestions are easily understood for me. But the last one is a little confused. Where should I insert this queue ? And the second and third suggestions will reduce the quality of video, am I right ?

The first one does not affect video quality, the second and third might. For the third one, you can insert queues between any given two elements on the pipeline. It is recommendable to put them before and after computation intensive elements that operate on CPU. ‘rtph264depay’, ‘h264parse’, and 'videoconvert’would be examples on your current pipeline.

The problem with the first one might be that gstreamer capture in opencv doesn’t support BGRx.

Maybe it could work if you have a gstreamer pipeline feeding a v4l2loop node with BGRx, and then use V4L2 API from opencv to read it in BGRx, but I haven’t tried that (I’m away from any jetson now). Well, not sure it helps for latency.

Have you tried to decrease rtspsrc property latency ?

You may also try to set a framerate.

In case it’s not done, be sure to enable all cores with nvpmodel MAXN (m0) and boost clocks with jetson_clocks.

OpenCV gstreamer capture does support other formats that are also supported by nvvidconv, like I420. But I was thinking of a GStreamer application that uses OpenCV for processing, not using gstcapture from opencv. We have an element named GstOpenCV that integrates openCV algorithms into a GStreamer pipeline:
[url]https://gstreamer.freedesktop.org/data/events/gstreamer-conference/2017/Angel%20Phillips%20-%20GStreamer%20and%20OpenCV%20using%20a%20GstOpenCV%20element.pdf[/url]

You could also develop a C++/python appsink and manage buffers yourself, and launch the pipeline outside openCV. Again, without using opencCV gstcapture.

@miguel.taylor,

Is there a public repository with gstOpenCv ? I failed to find more than the pdf you’ve sent. Also seen some repos with similar name but looks many years old.

I’d say that main point is the final processing requirement…If it requires BGR (or RGB) processing as most of opencv algorithms expect, then you have to make the conversion, and I think it would be better to do this with videoconvert in gstreamer (it may execute on a different core with queue) than in opencv. AFAIK, there is no YUV (I420 nor NV12) to BGR conversion available with cuda in opencv, and the cpu cv::cvtColor would just be a bit slower than gstreamer videoconvert. Grabbing YUV frames in opencv would be efficient for YUV processing only.

There is also some code published by @dusty_nv to do conversion from YUV into BGR with CUDA, but it isn’t straight forward for beginners.

Sadly, RidgeRun’s GstOpenCV is not open source, you can contact us if you are interested in a license. However, the process to implement your own element is not that difficult. There is already a GStreamer base class that we use for our plugin:

[url]https://github.com/GStreamer/gst-plugins-bad/blob/master/gst-libs/gst/opencv/gstopencvvideofilter.h[/url]

@kealennieh

I’d also recommend replacing omxh264dec with avdec_h264 max-threads=1 such that your pipeline reads:

rtspsrc location=rtsp://10.10.10.10:5445 latency=100 ! queue ! rtph264depay ! h264parse ! avdec_h264 max-threads=1 ! nvvidconv ! video/x-raw, width=(int)1920, height=(int)1080,format=(string)BGRx ! queue ! videoconvert ! queue ! appsink sync=false

You may even be able to use this pipeline and omit unnecessary video convert calls:

rtspsrc location=rtsp://10.10.10.10:5445 latency=100 ! queue ! rtph264depay ! h264parse ! avdec_h264 max-threads=1 ! videoconvert ! queue ! appsink sync=false

For some unknown reason, it takes longer to decode with the hardware decoder than a single-threaded software decoder regardless of the flags we’ve threw at it but YMMV.

1 Like

@kelsius @nvidias @DaneLLL
your method is working very well! i pull rtsp stream almost no latency!
but sadly didn’t use NVDEC hardware, when i use NVDEC hardware decoding, whatever is set params, it got big latency!,
so ,what is the problem?? can any NVIDIA’s people explain it ? PLEASE!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.