Gradualy increased memory usage when use gstreamer + opencv

Hi guys,
I used jetpack 4.2.2 and gstreamer 1.14.5 and opencv 3.4.6.
I want to use gstreamer plugin in the opencv for H264 hardware decoding with jetson nano.
I use this gstreamer elements in opencv:

gstream_elemets = (
‘rtspsrc location={} latency=300 !’
‘rtph264depay ! h264parse ! omxh264dec !’
'nvvidconv ! ’
‘video/x-raw , format=(string)BGRx !’
'videoconvert ! ’
‘appsink’).
cv2.VideoCapture(gstream_elemets, cv2.CAP_GSTREAMER)

This part of code is corectly work and because I want to decode multi-stream decoder, I use threads for this problem, but my problem is that, usage of memory is gradualy increased every time, why?

Hi,
We are deprecating omx plugins. Please try nvv4l2decoder.

OK, Thanks.
Opencv in Jetapck 4.4 is compiled with GStreamer support?

Hi,

Yes. So are Jetpack4.3 and Jetpack4.2.3.

nvv4l2decoder in the jetpack 4.2.2 and opencv 3.6.4 with gstreamer supported dosen’t corectly work. but in the shell terminal is worked.
I want to know, nvv4l2decoder in the jetpack 4.4 is compatible with opencv?

Hi,
It is OpenCV 4.1.1 on Jetpack 4.4. We don’t see any issue in running nvv4l2decoder with OpenCV, but maybe it is certain potential issue we do not notice. If yo still observe the issue, please share python code like

So that we can reproduce it.

OK Thanks.

“rtspsrc location={} latency=300 ! video/x-raw(memory:NVMM), width=(int)1920, height=(int)1080,format=(string)NV12, framerate=(fraction)30/1 ! nvvidconv ! video/x-raw, format=(string)BGRx ! videoconvert ! appsink”

The above elemets is correct for rtsp camera?
I have some question:
What are the uses of these elements? what’s diffrence between 2 and 3?
1- rtph264depay
2- nvvidconv
3-videoconvert

and I don’t know why you use the below elements before nvvidconv elemet? and why you use the first part video/x-raw(memory:NVMM) and the second part video/x-raw? and why use use the first part format=(string)NV12 and the second part format=(string)BGRx ? If possible more explain your propused order of elemets and usage of them. Thanks

video/x-raw(memory:NVMM), width=(int)1920, height=(int)1080,format=(string)NV12, framerate=(fraction)30/1

Hi,
The decode frames are NVMM buffers in NV12 formats. OpenCV accepts CPU buffers in BGR format. Due to the limitation of hardware engine, we convert it to BGRx format first and then copy to CPU buffers:

video/x-raw(memory:NVMM),format=(string)NV12 ! nvvidconv ! video/x-raw, format=(string)BGRx

And utilize videoconvert to convert to BGR format.

Thanks.
Eventually I have to copy decoded frames into cpu buffers due to opencv, Isn’t better to decoded frames pass to cpu buffers in the first step when I want to use opencv? i.e without video/x-raw(memory:NVMM),format=(string)NV12.
Q1- what’s the efficient solution(order elemets of GStreamer) your prefer? for passing decoded framed into opencv.
Q2- using the decoded frames in python code, the best way is to use opencv ?

Hi,
NVMM buffer is hardware DMA buffer which is directly accessed by hardware blocks. Hardware decoder cannot decode to CPU buffer directly. For optimal performance, we suggest run pure gstreamer pipeline in python like:

Thanks,
But when I run the below pipeline in opencv, The NVDEC is activated.

“rtspsrc location={} latency=300 ! nvvidconv ! video/x-raw, format=(string)BGRx ! videoconvert ! appsink”

so the decoder is use the hardware accelerator, right? and on the other hand, I don’t use video/x-raw(memory:NVMM),format=(string)NV12, in the above you said with adding this commad causes the decoded data use GPU buffer, I want to know, when I don’t use
video/x-raw(memory:NVMM),format=(string)NV12 and only use nvvidconv ! video/x-raw, format=(string)BGRx, the decoded data loaded in CPU Buffer of GPU Buffer, If the answer is GPU Buffer, So why we use video/x-raw(memory:NVMM),format=(string)NV12? what’s advantage of using this line in pipeline?

Hi,
You may configure
$ export GST_DEBUG=*FACTORY*:4

And check the log to know if nvv4l2decoder is picked

0:00:00.136144226 11414   0x7f980158f0 INFO     GST_ELEMENT_FACTORY gstelementfactory.c:361:gst_element_factory_create: creating element "nvv4l2decoder"

If it is nvv4l2decoder, it is always video/x-raw(memory:NVMM) in src pad.

$ gst-inspect-1.0 nvv4l2decoder
(...skip)
  SRC template: 'src'
    Availability: Always
    Capabilities:
      video/x-raw(memory:NVMM)
                  width: [ 1, 2147483647 ]
                 height: [ 1, 2147483647 ]
(skip...)

I get this logs:
(…skip)
SRC template: ‘src’
Availability: Always
Capabilities:
video/x-raw(memory:NVMM)
width: [ 1, 2147483647 ]
height: [ 1, 2147483647 ]
framerate: [ 0/1, 2147483647/1 ]

Element has no clocking capabilities.
Element has no URI handling capabilities.

(…skip)

That show the gstreamer supported nvv4l2decoder, right? when I use nvv4ldeocer in termial commnad the decoder is corectly work but in the opencv only work with omxh264dec. Is is maybe nvv4l2decoder to work in opencv?

What’s means in the above?
width: [ 1, 2147483647 ]
height: [ 1, 2147483647 ]
framerate: [ 0/1, 2147483647/1 ]

I also see this. Seems nvv4l2decoder fails to keep sync. You would add sync=false:

cap = cv2.VideoCapture("rtspsrc location=rtsp://127.0.0.1:8554/test ! application/x-rtp, media=video ! rtph264depay ! h264parse ! nvv4l2decoder ! nvvidconv ! video/x-raw, format=BGRx ! videoconvert ! video/x-raw, format=BGR ! appsink sync=false", cv2.CAP_GSTREAMER)

Thanks,
why do you use twice same convert ?

nvvidconv ! video/x-raw, format=BGRx ! videoconvert ! video/x-raw, format=BGR

In my opinion, It’s better to use like this :

cap = cv2.VideoCapture("rtspsrc location=rtsp://127.0.0.1:8554/test ! application/x-rtp, media=video ! rtph264depay ! h264parse ! nvv4l2decoder ! nvvidconv ! video/x-raw(memory:NVMM), format=NV12 ! videoconvert ! video/x-raw, format=BGRx ! appsink sync=false", cv2.CAP_GSTREAMER)

what’s the sync?

Hi,
Please check the source code in OpenCV:

    // we support 11 types of data:
    //     video/x-raw, format=BGR   -> 8bit, 3 channels
    //     video/x-raw, format=GRAY8 -> 8bit, 1 channel
    //     video/x-raw, format=UYVY  -> 8bit, 2 channel
    //     video/x-raw, format=YUY2  -> 8bit, 2 channel
    //     video/x-raw, format=YVYU  -> 8bit, 2 channel
    //     video/x-raw, format=NV12  -> 8bit, 1 channel (height is 1.5x larger than true height)
    //     video/x-raw, format=NV21  -> 8bit, 1 channel (height is 1.5x larger than true height)
    //     video/x-raw, format=YV12  -> 8bit, 1 channel (height is 1.5x larger than true height)
    //     video/x-raw, format=I420  -> 8bit, 1 channel (height is 1.5x larger than true height)
    //     video/x-bayer             -> 8bit, 1 channel
    //     image/jpeg                -> 8bit, mjpeg: buffer_size x 1 x 1

BGRx is not supported.

There is synchronization mechanism in gstreamer. It synchronizes frame rendering per timestamps. For some chance it may drop lots of frames when it is enabled. It you face this issue, you may disable it(sync=0) for a try.

I run this commad and this stuck in this state, why? when I replace nvv4l2decoder to omxh264dec, it work correctly.

I’ve seen similar issue on Xavier R32.4 with h264parse before nvv4l2decoder, as reported here. You may thus try to remove h264parse.

Hi @DaneLLL @kayccc @Honey_Patouceul
I have some question, If possible guidance me.

a) Using cv2.VideoCapture + Gstreamer, and this solution copied the decoded frames from NVVM buffer to CPU buffer, indeed occurred duplicated copy for one decoded frame, right?

b) Jetson nano used shared memory, then CPU and GPU memory are same, right? why we need GPU memory? Every things in CPU memory aren’t in GPU memory?

c) If I use cv2.Videocapture + Gstreamer using H.264 HW decoder, the decoded frames copied from NVMM buffer to CPU buffer, in this case, for one decoded frame we use 2 times memory out of whole memory?

d) If I use cv2.Videocapture + Gstreamer using H.264 HW decoder, the decoded frames copied from NVMM buffer to CPU buffer, in this case, then If I want to use GPU for pre/post processing, we again need to copied from CPU memory to GPU memory? in this case we use 3 times memory out of whole memory for one decode frame?

e) We know the disadvantage of gstreamer+opencv is copied GPU memory to CPU memory, I agree with this, but In this link used pure gstreamer pipeline with python code. In this case, the decoded frames go to GPU memory without copied into CPU memory, but in that link that I highlighted(line 123), the decoded frames bring into numpy format, in this case we have to use CPU memory, I want to know in this case also we copied gpu mem to cpu mem, in the term of performance these are same? Is it difference the coping of opencv+gsteamer with this link? which ones optimal?

f) If I want to access decodef frames without convert to numpy foramt, my mean is I want to do preprocessing directory in GPU memory, How I can do this? Is It need to bring into numpy format then do some preprocessing for that on GPU?

Hi,
The function also copies data from NVMM buffer to CPU buffer:

frame_image=np.array(n_frame,copy=True,order='C')

In the sample, it checks once per 30 frame. If you check every frame, there will be performance degradation.

This is optimal solution in python OpenCV+gstreamer. In using C, you can leverage dsexample plugin to process NVMM buffers through CUDA programming.