Explore the gstreamer pipeline with opencv

Hi @Honey_Patouceul

cap = cv2.VideoCapture("rtspsrc location=rtsp://IP:PORT/1920x1080.264 ! rtph264depay ! h264parse ! \
    video/x-h264, stream-format=avc ! h264parse ! video/x-h264, stream-format=byte-stream ! nvv4l2decoder ! video/x-raw(memory:NVMM), format=NV12! nvvidconv ! video/x-raw, format=(string)BGRx ! videoconvert ! video/x-raw,format=BGR ! appsink ")

I have some question about this gstreamer pipeline with opencv.
1- In this pipeline, the decoded frames copied from NVMM to CPU memory? If so, then the decoded frames allocated two times memory?
2- nvvidconv ! video/x-raw, format=(string)BGRx, This convertion is perform in NVMM or CPU?
3- videoconvert ! video/x-raw,format=BGR, Wha’t about this?
4- If I want to access the deoced frame in NVMM memoru without cpu memory, Is this possible to run this gstreamer pipeline with opencv? How?

Hi,

Yes, the decoded frames are copied from NVMM to CPU. The flow is

decoded NV12 frame in NVMM buffer -> convert to BGRx in NVMM buffer -> copy to BGRx CPU buffer

It is a software conversion from BGRx CPU buffer to BGR CPU buffer.

It is not possible in OpenCV. NVMM buffer is not supported in OpenCV. Only BGR CPU buffer is supported.

Thanks a lot @DaneLLL

Yes, the decoded frames are copied from NVMM to CPU. The flow is

Is it possible to remove the decoded frames from NVMM buffer when copied into cpu buffer? Is it possible to limit the queue of NVMM buffer and CPU buffer?

What’s difference between videoconvert ! video/x-raw,format=BGR and videoconvert?
when I put videoconvert instead of videoconvert ! video/x-raw,format=BGR, Than’s work correctly, why? in the only using videoconvert, the converation to BGR don’t occured or this is done by default in videoconvert?

Is the pipeline above efficient for decoding? If not so, How I can solve the problem?

Hi,

The hardware decoder decodes frames into NVMM buffers. OpenCV take CPU buffers, so the memory copy is required if you have to use OpenCV.

If caps is not specified in the pipeline, it does auto caps negotiation. By default the appsink requests for CPU buffers in BGR format so it still runs like

videoconvert ! video/x-raw,format=BGR ! appsink

It is optimal solution to run OpenCV on Jetson platforms. Due to limitation of hardware converter, need to have software converter to convert RGBA/BGRx to BGR.

Thanks a lot, @DaneLLL
1- what’s difference between two pipelines below?

cap1 = cv2.VideoCapture("rtspsrc location=rtsp://IP:PORT/1920x1080.264 ! rtph264depay ! h264parse ! \
    video/x-h264, stream-format=avc ! h264parse ! video/x-h264, stream-format=byte-stream ! nvv4l2decoder ! video/x-raw(memory:NVMM), format=NV12! nvvidconv ! video/x-raw, format=(string)BGRx ! videoconvert ! video/x-raw,format=BGR ! appsink ")


cap2 = cv2.VideoCapture("rtspsrc location=rtsp://IP:PORT/1920x1080.264 ! rtph264depay ! h264parse ! \
    video/x-h264, stream-format=avc ! h264parse ! queue max-buffers=10 !video/x-h264, stream-format=byte-stream ! nvv4l2decoder ! video/x-raw(memory:NVMM), format=NV12! nvvidconv ! video/x-raw, format=(string)BGRx ! videoconvert ! video/x-raw,format=BGR ! queue max-buffers=10 ! appsink ")

In the cap2 I used two queue in the pipeline, I want to know,If I use queue in the pipeline, It’s used more memory than cap1? When we need to use queue in the pipeline?

In my opinion, If I used one queue after each element , then we will not loss any data at each element of pipeline, right? This will cause more memory use?

Hi,

The queue plugin is native gstreamer plugin and we don’t know the detail of how it works. Could you please go to gstreamer forum to get further information? More experienced users in the forum may provide more detail about native gstreamer plugins.

Q1- I want to know what’s disadvantage of using opencv+gstreamer than pure gstreamer, I know, in the opencv+gstreaemer solution we have to copy decode frames from GPU mem(NVMM buffer) to CPU buffer, this has a I/O problem, and that cause slow streaming than pure gsteamer, I want to know except I/O problem, Is there a cons of this solution than pure gstreaemr? What is?
Q2- Coping data from GPU mem to CPU mem can be cause a double usage of DRAM? or when the data is copied from GPU mem to CPU mem then the GPU mem is free?

Q3- What’s difference between video/x-raw and video/x-raw(memory:NVMM)?
I want to know when a element of gstreamer support both video/x-raw and video/x-raw(memory:NVMM) that means is the element can work and access both GPU and CPU memory?

Hi,

Except the memory copy, another concern is OpenCV is CPU-based framework and would be better to use platforms with better CPU capability.

The buffer will be returned to previous element for next operation. For example, if you run

... ! nvvidconv ! video/x-raw,format=BGRx ! videoconvert ! video/x-raw,format=BGR ! appsink

Once videoconvert has done BGRx to BGR conversion, the BGRx buffer is returned back to nvvidconv.

No, it means the element can do video/x-raw to video/x-raw(memory:NVMM) or video/x-raw(memory:NVMM) to video/x-raw conversion.

Thanks so much,

Except the memory copy, another concern is OpenCV is CPU-based framework and would be better to use platforms with better CPU capability.

Ok you right the opencv is CPU base, but the main processing is done in GPU.

1- In the above workflow of deepstream, As you see, the Batching of Images is done on CPU before feed to DNN, So, we need to copy from GPU memory into CPU memory, right? I want to know, What’s main difference between above workflow and opencv solution? In both solution, we need copied the decoded of images into CPU, and then do process.
You right the opencv has coping data from GPU mem to CPU mem, that’s bottleneck, but In the deepstream has also Coping GPU mem into CPU mem before feeded into DNN.

No, it means the element can do video/x-raw to video/x-
raw(memory:NVMM) or video/x-raw(memory:NVMM) to video/x-raw conversion.

2- Conversion from video/x-raw(memory:NVMM) to video/x-raw is occurred from GPU memory into CPU memory?

3- Jetson nano has capability of encoding 8-1080 @3O fps, I want to know is it possible to encode 16-1080 @15 fps?

Hi,
Batching is done in nvstreammux and does not do memory copy. It is to collect frames from each source to from a source list and send to next element. The frames are passed to next element directly and no memory copy is executed.

The software features are listed in the development guide. You may try the case of multi-encoding threads.

Hi @DaneLLL,

nvstreammux and does not do memory copy. It is to collect frames from each source to from a source list and send to next element.

The decoder placed in the NVVM buffer(GPU mem) before batching, and as you see in the above pic, the batching is perform in the CPU platform, so in the way, can the CPU directly access to GPU mem?

Hi,
The pipeline runs like

... ! video/x-raw(memory:NVMM) ! nvstreammux ! video/x-raw(memory:NVMM) ! nvinfer ! ...

The nvstreammux plugin is to collect NVMM buffers into a list and send to next nvinfer. No memory copy is performed.

@DaneLLL, OK I know what’s you say, but my question is that:
In the above pic, why the batching of data is referred to CPU? If really batching of data is performed in CPU, so we need to push data from NVMM buffer to CPU buffer, because the CPU doesn’t directory access to NVMM buffer. right?

Hi,
It is to collect frames from each source to generate metadata such as NvDsMetaList, NvDsFrameMeta, NvDsBatchMeta. The operation is performed in CPU. It does not push data from NVMM buffer to CPU buffer.

Ok you right, @DaneLLL

1- I want to know, the decoded frames after decoder with HW decoder are exist in NVMM buffer, right? before batching is perform by CPU.
2- If yes, Can CPU directly accesses to NVMM Buffer to perform batching? My main mean is that, Can CPU directly accesses to NVMM Buffer?
3- If really CPU can access to NVMM buffer, The opecv is CPU base, why opencv doesn’t perform on NVMM buffer when we went to decode the frames with Gstreamer and perform copy the frames from NVMM buffer to CPU buffer?

Hi,
In batching operation, we don’t access NVMM buffers. We organize the buffers from each source into a list. For example, if we have two sources, nvstreammux receives one NVMM buffer from source 0 and one NVMM buffer from source 1 every 33 ms(assume 30fps). The two buffers are made into a surface list and send to next nvinfer. Don’t access the data in NVMM buffer through CPU.