Gradualy increased memory usage when use gstreamer + opencv

LoveNvidia · April 25, 2020, 8:54pm

Hi guys,
I used jetpack 4.2.2 and gstreamer 1.14.5 and opencv 3.4.6.
I want to use gstreamer plugin in the opencv for H264 hardware decoding with jetson nano.
I use this gstreamer elements in opencv:

gstream_elemets = (
‘rtspsrc location={} latency=300 !’
‘rtph264depay ! h264parse ! omxh264dec !’
'nvvidconv ! ’
‘video/x-raw , format=(string)BGRx !’
'videoconvert ! ’
‘appsink’).
cv2.VideoCapture(gstream_elemets, cv2.CAP_GSTREAMER)

This part of code is corectly work and because I want to decode multi-stream decoder, I use threads for this problem, but my problem is that, usage of memory is gradualy increased every time, why?

DaneLLL · April 27, 2020, 1:26am

Hi,
We are deprecating omx plugins. Please try nvv4l2decoder.

LoveNvidia · April 27, 2020, 7:28am

OK, Thanks.
Opencv in Jetapck 4.4 is compiled with GStreamer support?

DaneLLL · April 27, 2020, 8:15am

Hi,

Yes. So are Jetpack4.3 and Jetpack4.2.3.

LoveNvidia · April 27, 2020, 10:19am

nvv4l2decoder in the jetpack 4.2.2 and opencv 3.6.4 with gstreamer supported dosen’t corectly work. but in the shell terminal is worked.
I want to know, nvv4l2decoder in the jetpack 4.4 is compatible with opencv?

DaneLLL · April 27, 2020, 11:21pm

Hi,
It is OpenCV 4.1.1 on Jetpack 4.4. We don’t see any issue in running nvv4l2decoder with OpenCV, but maybe it is certain potential issue we do not notice. If yo still observe the issue, please share python code like

So that we can reproduce it.

LoveNvidia · April 27, 2020, 11:34pm

OK Thanks.

“rtspsrc location={} latency=300 ! video/x-raw(memory:NVMM), width=(int)1920, height=(int)1080,format=(string)NV12, framerate=(fraction)30/1 ! nvvidconv ! video/x-raw, format=(string)BGRx ! videoconvert ! appsink”

The above elemets is correct for rtsp camera?
I have some question:
What are the uses of these elements? what’s diffrence between 2 and 3?
1- rtph264depay
2- nvvidconv
3-videoconvert

and I don’t know why you use the below elements before nvvidconv elemet? and why you use the first part video/x-raw(memory:NVMM) and the second part video/x-raw? and why use use the first part format=(string)NV12 and the second part format=(string)BGRx ? If possible more explain your propused order of elemets and usage of them. Thanks

video/x-raw(memory:NVMM), width=(int)1920, height=(int)1080,format=(string)NV12, framerate=(fraction)30/1

DaneLLL · April 28, 2020, 1:13am

Hi,
The decode frames are NVMM buffers in NV12 formats. OpenCV accepts CPU buffers in BGR format. Due to the limitation of hardware engine, we convert it to BGRx format first and then copy to CPU buffers:

video/x-raw(memory:NVMM),format=(string)NV12 ! nvvidconv ! video/x-raw, format=(string)BGRx

And utilize videoconvert to convert to BGR format.

LoveNvidia · April 28, 2020, 10:45am

Thanks.
Eventually I have to copy decoded frames into cpu buffers due to opencv, Isn’t better to decoded frames pass to cpu buffers in the first step when I want to use opencv? i.e without video/x-raw(memory:NVMM),format=(string)NV12.
Q1- what’s the efficient solution(order elemets of GStreamer) your prefer? for passing decoded framed into opencv.
Q2- using the decoded frames in python code, the best way is to use opencv ?

DaneLLL · April 28, 2020, 11:32pm

Hi,
NVMM buffer is hardware DMA buffer which is directly accessed by hardware blocks. Hardware decoder cannot decode to CPU buffer directly. For optimal performance, we suggest run pure gstreamer pipeline in python like:

LoveNvidia · April 29, 2020, 8:50am

Thanks,
But when I run the below pipeline in opencv, The NVDEC is activated.

“rtspsrc location={} latency=300 ! nvvidconv ! video/x-raw, format=(string)BGRx ! videoconvert ! appsink”

so the decoder is use the hardware accelerator, right? and on the other hand, I don’t use video/x-raw(memory:NVMM),format=(string)NV12, in the above you said with adding this commad causes the decoded data use GPU buffer, I want to know, when I don’t use
video/x-raw(memory:NVMM),format=(string)NV12 and only use nvvidconv ! video/x-raw, format=(string)BGRx, the decoded data loaded in CPU Buffer of GPU Buffer, If the answer is GPU Buffer, So why we use video/x-raw(memory:NVMM),format=(string)NV12? what’s advantage of using this line in pipeline?

DaneLLL · April 29, 2020, 9:30am

Hi,
You may configure
$ export GST_DEBUG=*FACTORY*:4

And check the log to know if nvv4l2decoder is picked

0:00:00.136144226 11414   0x7f980158f0 INFO     GST_ELEMENT_FACTORY gstelementfactory.c:361:gst_element_factory_create: creating element "nvv4l2decoder"

If it is nvv4l2decoder, it is always video/x-raw(memory:NVMM) in src pad.

$ gst-inspect-1.0 nvv4l2decoder
(...skip)
  SRC template: 'src'
    Availability: Always
    Capabilities:
      video/x-raw(memory:NVMM)
                  width: [ 1, 2147483647 ]
                 height: [ 1, 2147483647 ]
(skip...)

LoveNvidia · April 29, 2020, 3:56pm

I get this logs:
(…skip)
SRC template: ‘src’
Availability: Always
Capabilities:
video/x-raw(memory:NVMM)
width: [ 1, 2147483647 ]
height: [ 1, 2147483647 ]
framerate: [ 0/1, 2147483647/1 ]

Element has no clocking capabilities.
Element has no URI handling capabilities.

(…skip)

That show the gstreamer supported nvv4l2decoder, right? when I use nvv4ldeocer in termial commnad the decoder is corectly work but in the opencv only work with omxh264dec. Is is maybe nvv4l2decoder to work in opencv?

What’s means in the above?
width: [ 1, 2147483647 ]
height: [ 1, 2147483647 ]
framerate: [ 0/1, 2147483647/1 ]

Honey_Patouceul · April 29, 2020, 5:58pm

I also see this. Seems nvv4l2decoder fails to keep sync. You would add sync=false:

cap = cv2.VideoCapture("rtspsrc location=rtsp://127.0.0.1:8554/test ! application/x-rtp, media=video ! rtph264depay ! h264parse ! nvv4l2decoder ! nvvidconv ! video/x-raw, format=BGRx ! videoconvert ! video/x-raw, format=BGR ! appsink sync=false", cv2.CAP_GSTREAMER)

LoveNvidia · April 29, 2020, 9:57pm

Thanks,
why do you use twice same convert ?

nvvidconv ! video/x-raw, format=BGRx ! videoconvert ! video/x-raw, format=BGR

In my opinion, It’s better to use like this :

cap = cv2.VideoCapture("rtspsrc location=rtsp://127.0.0.1:8554/test ! application/x-rtp, media=video ! rtph264depay ! h264parse ! nvv4l2decoder ! nvvidconv ! video/x-raw(memory:NVMM), format=NV12 ! videoconvert ! video/x-raw, format=BGRx ! appsink sync=false", cv2.CAP_GSTREAMER)

what’s the sync?

DaneLLL · April 29, 2020, 10:35pm

Hi,
Please check the source code in OpenCV:

github.com

opencv/opencv/blob/master/modules/videoio/src/cap_gstreamer.cpp

/*M///////////////////////////////////////////////////////////////////////////////////////
//
//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
//
//  By downloading, copying, installing or using the software you agree to this license.
//  If you do not agree to this license, do not download, install,
//  copy or use the software.
//
//
//                        Intel License Agreement
//                For Open Source Computer Vision Library
//
// Copyright (C) 2008, 2011, Nils Hasler, all rights reserved.
// Third party copyrights are property of their respective owners.
//
// Redistribution and use in source and binary forms, with or without modification,
// are permitted provided that the following conditions are met:
//
//   * Redistribution's of source code must retain the above copyright notice,
//     this list of conditions and the following disclaimer.

This file has been truncated. show original

    // we support 11 types of data:
    //     video/x-raw, format=BGR   -> 8bit, 3 channels
    //     video/x-raw, format=GRAY8 -> 8bit, 1 channel
    //     video/x-raw, format=UYVY  -> 8bit, 2 channel
    //     video/x-raw, format=YUY2  -> 8bit, 2 channel
    //     video/x-raw, format=YVYU  -> 8bit, 2 channel
    //     video/x-raw, format=NV12  -> 8bit, 1 channel (height is 1.5x larger than true height)
    //     video/x-raw, format=NV21  -> 8bit, 1 channel (height is 1.5x larger than true height)
    //     video/x-raw, format=YV12  -> 8bit, 1 channel (height is 1.5x larger than true height)
    //     video/x-raw, format=I420  -> 8bit, 1 channel (height is 1.5x larger than true height)
    //     video/x-bayer             -> 8bit, 1 channel
    //     image/jpeg                -> 8bit, mjpeg: buffer_size x 1 x 1

BGRx is not supported.

There is synchronization mechanism in gstreamer. It synchronizes frame rendering per timestamps. For some chance it may drop lots of frames when it is enabled. It you face this issue, you may disable it(sync=0) for a try.

LoveNvidia · June 13, 2020, 10:27am

I run this commad and this stuck in this state, why? when I replace nvv4l2decoder to omxh264dec, it work correctly.

Honey_Patouceul · June 13, 2020, 12:47pm

I’ve seen similar issue on Xavier R32.4 with h264parse before nvv4l2decoder, as reported here. You may thus try to remove h264parse.

LoveNvidia · July 29, 2020, 9:28am

Hi @DaneLLL @kayccc @Honey_Patouceul
I have some question, If possible guidance me.

a) Using cv2.VideoCapture + Gstreamer, and this solution copied the decoded frames from NVVM buffer to CPU buffer, indeed occurred duplicated copy for one decoded frame, right?

b) Jetson nano used shared memory, then CPU and GPU memory are same, right? why we need GPU memory? Every things in CPU memory aren’t in GPU memory?

c) If I use cv2.Videocapture + Gstreamer using H.264 HW decoder, the decoded frames copied from NVMM buffer to CPU buffer, in this case, for one decoded frame we use 2 times memory out of whole memory?

d) If I use cv2.Videocapture + Gstreamer using H.264 HW decoder, the decoded frames copied from NVMM buffer to CPU buffer, in this case, then If I want to use GPU for pre/post processing, we again need to copied from CPU memory to GPU memory? in this case we use 3 times memory out of whole memory for one decode frame?

e) We know the disadvantage of gstreamer+opencv is copied GPU memory to CPU memory, I agree with this, but In this link used pure gstreamer pipeline with python code. In this case, the decoded frames go to GPU memory without copied into CPU memory, but in that link that I highlighted(line 123), the decoded frames bring into numpy format, in this case we have to use CPU memory, I want to know in this case also we copied gpu mem to cpu mem, in the term of performance these are same? Is it difference the coping of opencv+gsteamer with this link? which ones optimal?

f) If I want to access decodef frames without convert to numpy foramt, my mean is I want to do preprocessing directory in GPU memory, How I can do this? Is It need to bring into numpy format then do some preprocessing for that on GPU?

DaneLLL · July 31, 2020, 12:20am

Hi,
The function also copies data from NVMM buffer to CPU buffer:

frame_image=np.array(n_frame,copy=True,order='C')

In the sample, it checks once per 30 frame. If you check every frame, there will be performance degradation.

This is optimal solution in python OpenCV+gstreamer. In using C, you can leverage dsexample plugin to process NVMM buffers through CUDA programming.

Topic		Replies	Views
Explore the gstreamer pipeline with opencv Jetson Nano opencv	16	3609	October 18, 2021
GPU Acceleration Support for OpenCV Gstreamer Pipeline Jetson Xavier NX opencv , gstreamer	17	8022	October 18, 2021
Issues decoding RTSP stream using nvv4l2decoder with Jetpack 4.4 Jetson Nano rtsp , nvbugs	27	8125	October 18, 2021
Memory leak use Gstreamer nvv4l2decoder with opencv VideoCaputer DeepStream SDK	10	853	August 31, 2022
Is busy CPU usage when jetson nano hardware decoding is doing task? Jetson Nano	18	2030	October 14, 2021
Gstreamer use MJPEG codec Jetson Nano opencv , gstreamer	16	16046	October 15, 2021
Multi-stream rtsp GStreamer + Opencv Jetson Nano rtsp , opencv , gstreamer	12	3014	October 18, 2021
Multiple stream from one camera via gstreamer Jetson Nano camera , gstreamer	11	6532	October 15, 2021
Conflict between video/x-raw and rtsp source Jetson Nano gstreamer	6	1680	October 18, 2021
CSI-Camera Raspberry Pi v2 not work on Jetson Nano Jetson Nano	11	13633	October 14, 2021

Gradualy increased memory usage when use gstreamer + opencv

Related topics