OpenCV videoCapture performance problem with gstreamer


We are trying to capture 2x 4k@30fps H264 video either from a file or from a rtsp source.Using gst-launch with the pipeline below it gives us excellent performance with hardware acceleration. But using OpenCV even a single video uses %100 CPU on all cores and no GPU at all and just videoCapture takes 45ms without imshow or anything.

gst-launch-1.0 filesrc location=<filename.mp4> ! qtdemux name=demux demux.video_0 ! queue ! h264parse ! omxh264dec ! nveglglessink -e

It’s apparent that OpenCV uses gstreamer for videoIO but somehow does not use hardware acceleration at all.

Gstreamer version is 1.8.3 and OpenCV is compiled with Gstreamer support with the same version. And we are developing the software on C++.

We even tried compiling NvPipe but it requires nvcuvid which is abandoned on Jetson in lieu of gstreamer and v4l.

Is it possible to leverage hardware acceleration in any way and pass the frame mat to OpenCV?

Thank You

It is possible to use a gstreamer pipeline ending with appsink as opencv videoCapture. Note that the opencv 3.3.1 build in JetPack has no CUDA nor gstreamer support, so you would have to build your own. You can use this script or read this page.

Then you would use something like:

#include <iostream>
#include <opencv2/core.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/videoio.hpp>

int main(void)
        const char *gst =   "rtspsrc location=rtsp:// ! application/x-rtp, media=(string)video \
                           ! decodebin    ! video/x-raw, format=(string)NV12 \
                           ! videoconvert ! video/x-raw, format=(string)BGR \
                           ! appsink";

        cv::VideoCapture cap(gst);
        if( !cap.isOpened() )
            std::cout << "Error: failed" << std::endl;
            return -1;

        unsigned int width = cap.get(CV_CAP_PROP_FRAME_WIDTH); 
        unsigned int height = cap.get(CV_CAP_PROP_FRAME_HEIGHT); 
        unsigned int pixels = width*height;
        std::cout <<"Frame size : "<<width<<" x "<<height<<", "<<pixels<<" Pixels "<<std::endl;

        cv::namedWindow("RTSP_Preview", CV_WINDOW_AUTOSIZE);
        cv::Mat frame_in(width, height, CV_8UC3);

                if (! {
                        std::cout<<"Capture read error"<<std::endl;
                cv::waitKey(1); // let imshow draw      

        return 0;

Note that videoconvert runs on CPU only and takes some time. If your opencv processing can be done in YUV format (I420 or NV12), you would remove videoconvert from the pipeline and receive I420 or NV12 frames instead.
Note also that in this example I’ve used decodebin that in turn should select the HW decoder, if you want to be sure you can replace ‘decodebin’ with ‘qtdemux ! h264parse ! omxh264dec’.