Gradualy increased memory usage when use gstreamer + opencv

Hi,
The decode frames are NVMM buffers in NV12 formats. OpenCV accepts CPU buffers in BGR format. Due to the limitation of hardware engine, we convert it to BGRx format first and then copy to CPU buffers:

video/x-raw(memory:NVMM),format=(string)NV12 ! nvvidconv ! video/x-raw, format=(string)BGRx

And utilize videoconvert to convert to BGR format.

Thanks.
Eventually I have to copy decoded frames into cpu buffers due to opencv, Isn’t better to decoded frames pass to cpu buffers in the first step when I want to use opencv? i.e without video/x-raw(memory:NVMM),format=(string)NV12.
Q1- what’s the efficient solution(order elemets of GStreamer) your prefer? for passing decoded framed into opencv.
Q2- using the decoded frames in python code, the best way is to use opencv ?

Hi,
NVMM buffer is hardware DMA buffer which is directly accessed by hardware blocks. Hardware decoder cannot decode to CPU buffer directly. For optimal performance, we suggest run pure gstreamer pipeline in python like:

Thanks,
But when I run the below pipeline in opencv, The NVDEC is activated.

“rtspsrc location={} latency=300 ! nvvidconv ! video/x-raw, format=(string)BGRx ! videoconvert ! appsink”

so the decoder is use the hardware accelerator, right? and on the other hand, I don’t use video/x-raw(memory:NVMM),format=(string)NV12, in the above you said with adding this commad causes the decoded data use GPU buffer, I want to know, when I don’t use
video/x-raw(memory:NVMM),format=(string)NV12 and only use nvvidconv ! video/x-raw, format=(string)BGRx, the decoded data loaded in CPU Buffer of GPU Buffer, If the answer is GPU Buffer, So why we use video/x-raw(memory:NVMM),format=(string)NV12? what’s advantage of using this line in pipeline?

Hi,
You may configure
$ export GST_DEBUG=*FACTORY*:4

And check the log to know if nvv4l2decoder is picked

0:00:00.136144226 11414   0x7f980158f0 INFO     GST_ELEMENT_FACTORY gstelementfactory.c:361:gst_element_factory_create: creating element "nvv4l2decoder"

If it is nvv4l2decoder, it is always video/x-raw(memory:NVMM) in src pad.

$ gst-inspect-1.0 nvv4l2decoder
(...skip)
  SRC template: 'src'
    Availability: Always
    Capabilities:
      video/x-raw(memory:NVMM)
                  width: [ 1, 2147483647 ]
                 height: [ 1, 2147483647 ]
(skip...)

I get this logs:
(…skip)
SRC template: ‘src’
Availability: Always
Capabilities:
video/x-raw(memory:NVMM)
width: [ 1, 2147483647 ]
height: [ 1, 2147483647 ]
framerate: [ 0/1, 2147483647/1 ]

Element has no clocking capabilities.
Element has no URI handling capabilities.

(…skip)

That show the gstreamer supported nvv4l2decoder, right? when I use nvv4ldeocer in termial commnad the decoder is corectly work but in the opencv only work with omxh264dec. Is is maybe nvv4l2decoder to work in opencv?

What’s means in the above?
width: [ 1, 2147483647 ]
height: [ 1, 2147483647 ]
framerate: [ 0/1, 2147483647/1 ]

I also see this. Seems nvv4l2decoder fails to keep sync. You would add sync=false:

cap = cv2.VideoCapture("rtspsrc location=rtsp://127.0.0.1:8554/test ! application/x-rtp, media=video ! rtph264depay ! h264parse ! nvv4l2decoder ! nvvidconv ! video/x-raw, format=BGRx ! videoconvert ! video/x-raw, format=BGR ! appsink sync=false", cv2.CAP_GSTREAMER)

Thanks,
why do you use twice same convert ?

nvvidconv ! video/x-raw, format=BGRx ! videoconvert ! video/x-raw, format=BGR

In my opinion, It’s better to use like this :

cap = cv2.VideoCapture("rtspsrc location=rtsp://127.0.0.1:8554/test ! application/x-rtp, media=video ! rtph264depay ! h264parse ! nvv4l2decoder ! nvvidconv ! video/x-raw(memory:NVMM), format=NV12 ! videoconvert ! video/x-raw, format=BGRx ! appsink sync=false", cv2.CAP_GSTREAMER)

what’s the sync?

Hi,
Please check the source code in OpenCV:

    // we support 11 types of data:
    //     video/x-raw, format=BGR   -> 8bit, 3 channels
    //     video/x-raw, format=GRAY8 -> 8bit, 1 channel
    //     video/x-raw, format=UYVY  -> 8bit, 2 channel
    //     video/x-raw, format=YUY2  -> 8bit, 2 channel
    //     video/x-raw, format=YVYU  -> 8bit, 2 channel
    //     video/x-raw, format=NV12  -> 8bit, 1 channel (height is 1.5x larger than true height)
    //     video/x-raw, format=NV21  -> 8bit, 1 channel (height is 1.5x larger than true height)
    //     video/x-raw, format=YV12  -> 8bit, 1 channel (height is 1.5x larger than true height)
    //     video/x-raw, format=I420  -> 8bit, 1 channel (height is 1.5x larger than true height)
    //     video/x-bayer             -> 8bit, 1 channel
    //     image/jpeg                -> 8bit, mjpeg: buffer_size x 1 x 1

BGRx is not supported.

There is synchronization mechanism in gstreamer. It synchronizes frame rendering per timestamps. For some chance it may drop lots of frames when it is enabled. It you face this issue, you may disable it(sync=0) for a try.

I run this commad and this stuck in this state, why? when I replace nvv4l2decoder to omxh264dec, it work correctly.

I’ve seen similar issue on Xavier R32.4 with h264parse before nvv4l2decoder, as reported here. You may thus try to remove h264parse.

Hi @DaneLLL @kayccc @Honey_Patouceul
I have some question, If possible guidance me.

a) Using cv2.VideoCapture + Gstreamer, and this solution copied the decoded frames from NVVM buffer to CPU buffer, indeed occurred duplicated copy for one decoded frame, right?

b) Jetson nano used shared memory, then CPU and GPU memory are same, right? why we need GPU memory? Every things in CPU memory aren’t in GPU memory?

c) If I use cv2.Videocapture + Gstreamer using H.264 HW decoder, the decoded frames copied from NVMM buffer to CPU buffer, in this case, for one decoded frame we use 2 times memory out of whole memory?

d) If I use cv2.Videocapture + Gstreamer using H.264 HW decoder, the decoded frames copied from NVMM buffer to CPU buffer, in this case, then If I want to use GPU for pre/post processing, we again need to copied from CPU memory to GPU memory? in this case we use 3 times memory out of whole memory for one decode frame?

e) We know the disadvantage of gstreamer+opencv is copied GPU memory to CPU memory, I agree with this, but In this link used pure gstreamer pipeline with python code. In this case, the decoded frames go to GPU memory without copied into CPU memory, but in that link that I highlighted(line 123), the decoded frames bring into numpy format, in this case we have to use CPU memory, I want to know in this case also we copied gpu mem to cpu mem, in the term of performance these are same? Is it difference the coping of opencv+gsteamer with this link? which ones optimal?

f) If I want to access decodef frames without convert to numpy foramt, my mean is I want to do preprocessing directory in GPU memory, How I can do this? Is It need to bring into numpy format then do some preprocessing for that on GPU?

Hi,
The function also copies data from NVMM buffer to CPU buffer:

frame_image=np.array(n_frame,copy=True,order='C')

In the sample, it checks once per 30 frame. If you check every frame, there will be performance degradation.

This is optimal solution in python OpenCV+gstreamer. In using C, you can leverage dsexample plugin to process NVMM buffers through CUDA programming.

Thanks @DaneLLL
If I want to connect USB Coral TPU to nano, and give some frames to that, I have to copied the decoded frames from NVMM buffer to CPU buffer? Is it possible to access data from NVMM buffer to USB TPU?

Hi,
We don’t have experience of using the device. May see if other users can share suggestion.

For running Deep Learning, we would suggest use DeepStream SDK.

I tested the deepstream sdk, I found that don’t use memory for multi-stream decoder, when copied the decodes frames from NVMM buffer to CPU buffer, that used more memory,I want to know, NVMM buffer is independent of memory of jetson? and CPU buffer is depend of memory?

Hi,
The following explanation may help. On x86 PC with NVIDIA GPU, the GPU buffer is device memory and CPU buffer is host memory. On Jetson platforms, NVMM buffer is sort of device memory.

NVMM buffer is sort of device memory.

For example, in jetson nano with 4GB dram, the NVMM buffer include the 4GB ram?

Hi,

Yes, NVMM buffers are allocated on DRAM.