opencv and Cuda


I am trying to use opencv and cuda together on a jetson TX1. I have opencv 3.4.1 with CUDA support.

but I got that error :

OpenCV(3.4.1-dev) Error: The function/feature is not implemented (The called functionality is disabled for current build or platform) in throw_no_cuda, file /home/nvidia/devel/opencv/opencv/modules/core/include/opencv2/core/private.cuda.hpp, line 111
terminate called after throwing an instance of 'cv::Exception'
  what():  OpenCV(3.4.1-dev) /home/nvidia/devel/opencv/opencv/modules/core/include/opencv2/core/private.cuda.hpp:111: error: (-213) The called functionality is disabled for current build or platform in function throw_no_cuda


here the code taken from opencv example :

#include <iostream>

#include "opencv2/opencv_modules.hpp"


#include <string>
#include <vector>
#include <algorithm>
#include <numeric>

#include <opencv2/core.hpp>
#include <opencv2/core/opengl.hpp>
#include <opencv2/cudacodec.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/cudacodec.hpp>
#include <opencv2/cudaimgproc.hpp>

int main(int argc, const char* argv[])

    const std::string fname = "rtsp://";
   // const std::string fname = "/home/vbellet/dev/sportview/videos/TheBourne.mp4";

    std::cout << cv::getBuildInformation() << std::endl;

    cv::namedWindow("CPU", cv::WINDOW_NORMAL);
    cv::namedWindow("GPU", cv::WINDOW_OPENGL);

    cv::Mat frame;
    cv::VideoCapture reader(fname);

    cv::cuda::GpuMat d_frame;
    cv::Ptr<cv::cudacodec::VideoReader> d_reader = cv::cudacodec::createVideoReader(fname);

    cv::TickMeter tm;
    std::vector<double> cpu_times;
    std::vector<double> gpu_times;

    int gpu_frame_count=0, cpu_frame_count=0;

    for (;;)
        tm.reset(); tm.start();
        if (!

        cv::imshow("CPU", frame);

        if (cv::waitKey(3) > 0)

    for (;;)
        tm.reset(); tm.start();
        if (!d_reader->nextFrame(d_frame))

        cv::imshow("GPU", d_frame);

        if (cv::waitKey(3) > 0)

    if (!cpu_times.empty() && !gpu_times.empty())
        std::cout << std::endl << "Results:" << std::endl;

        std::sort(cpu_times.begin(), cpu_times.end());
        std::sort(gpu_times.begin(), gpu_times.end());

        double cpu_avg = std::accumulate(cpu_times.begin(), cpu_times.end(), 0.0) / cpu_times.size();
        double gpu_avg = std::accumulate(gpu_times.begin(), gpu_times.end(), 0.0) / gpu_times.size();

        std::cout << "CPU : Avg : " << cpu_avg << " ms FPS : " << 1000.0 / cpu_avg << " Frames " << cpu_frame_count << std::endl;
        std::cout << "GPU : Avg : " << gpu_avg << " ms FPS : " << 1000.0 / gpu_avg << " Frames " << gpu_frame_count << std::endl;

    return 0;


int main()
    std::cout << "OpenCV was built without CUDA Video decoding support\n" << std::endl;
    return 0;


and here the opencv configuration:

General configuration for OpenCV 3.4.1-dev =====================================
  Version control:               3.4.1-9-gec0bb66-dirty

    Timestamp:                   2018-11-01T15:00:09Z
    Host:                        Linux 4.4.38-tegra aarch64
    CMake:                       3.5.1
    CMake generator:             Unix Makefiles
    CMake build tool:            /usr/bin/make
    Configuration:               RELEASE

  CPU/HW features:
    Baseline:                    NEON FP16
      required:                  NEON
      disabled:                  VFPV3

    Built as dynamic libs?:      YES
    C++ Compiler:                /usr/bin/c++  (ver 5.4.0)
    C++ flags (Release):         -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Winit-self -Wno-narrowing -Wno-delete-non-virtual-dtor -Wno-comment -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections    -fvisibility=hidden -fvisibility-inlines-hidden -O3 -DNDEBUG  -DNDEBUG
    C++ flags (Debug):           -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Winit-self -Wno-narrowing -Wno-delete-non-virtual-dtor -Wno-comment -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections    -fvisibility=hidden -fvisibility-inlines-hidden -g  -O0 -DDEBUG -D_DEBUG
    C Compiler:                  /usr/bin/cc
    C flags (Release):           -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Winit-self -Wno-narrowing -Wno-comment -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections    -fvisibility=hidden -O3 -DNDEBUG  -DNDEBUG
    C flags (Debug):             -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Winit-self -Wno-narrowing -Wno-comment -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections    -fvisibility=hidden -g  -O0 -DDEBUG -D_DEBUG
    Linker flags (Release):      
    Linker flags (Debug):        
    ccache:                      NO
    Precompiled headers:         YES
    Extra dependencies:          dl m pthread rt /usr/lib/aarch64-linux-gnu/ /usr/lib/aarch64-linux-gnu/ cudart nppc nppial nppicc nppicom nppidei nppif nppig nppim nppist nppisu nppitc npps cublas cufft -L/usr/local/cuda-9.0/lib64
    3rdparty dependencies:

  OpenCV modules:
    To be built:                 calib3d core cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev dnn features2d flann highgui imgcodecs imgproc java_bindings_generator ml objdetect photo python2 python3 python_bindings_generator shape stitching superres ts video videoio videostab
    Disabled:                    js world
    Disabled by dependency:      -
    Unavailable:                 java viz
    Applications:                tests perf_tests apps
    Documentation:               NO
    Non-free algorithms:         NO

    QT:                          YES (ver 5.5.1)
      QT OpenGL support:         YES (Qt5::OpenGL 5.5.1)
    GTK+:                        NO
    OpenGL support:              YES (/usr/lib/aarch64-linux-gnu/ /usr/lib/aarch64-linux-gnu/
    VTK support:                 NO

  Media I/O: 
    ZLib:                        /usr/lib/aarch64-linux-gnu/ (ver 1.2.8)
    JPEG:                        /usr/lib/aarch64-linux-gnu/ (ver )
    WEBP:                        build (ver encoder: 0x020e)
    PNG:                         /usr/lib/aarch64-linux-gnu/ (ver 1.2.54)
    TIFF:                        /usr/lib/aarch64-linux-gnu/ (ver 42 / 4.0.6)
    JPEG 2000:                   /usr/lib/aarch64-linux-gnu/ (ver 1.900.1)
    OpenEXR:                     build (ver 1.7.1)

  Video I/O:
    DC1394:                      NO
    FFMPEG:                      YES
      avcodec:                   YES (ver 56.60.100)
      avformat:                  YES (ver 56.40.101)
      avutil:                    YES (ver 54.31.100)
      swscale:                   YES (ver 3.1.101)
      avresample:                NO
      base:                      YES (ver 1.8.3)
      video:                     YES (ver 1.8.3)
      app:                       YES (ver 1.8.3)
      riff:                      YES (ver 1.8.3)
      pbutils:                   YES (ver 1.8.3)
    libv4l/libv4l2:              1.10.0 / 1.10.0
    v4l/v4l2:                    linux/videodev2.h
    gPhoto2:                     NO

  Parallel framework:            pthreads

  Trace:                         YES (built-in)

  Other third-party libraries:
    Lapack:                      NO
    Eigen:                       YES (ver 3.2.92)
    Custom HAL:                  YES (carotene (ver 0.0.1))
    Protobuf:                    build (3.5.1)

  NVIDIA CUDA:                   YES (ver 9.0, CUFFT CUBLAS FAST_MATH)
    NVIDIA GPU arch:             53
    NVIDIA PTX archs:

  OpenCL:                        YES (no extra features)
    Include path:                /home/nvidia/devel/opencv/opencv/3rdparty/include/opencl/1.2
    Link libraries:              Dynamic load

  Python 2:
    Interpreter:                 /usr/bin/python2.7 (ver 2.7.12)
    Libraries:                   /usr/lib/aarch64-linux-gnu/ (ver 2.7.12)
    numpy:                       /usr/lib/python2.7/dist-packages/numpy/core/include (ver 1.11.0)
    packages path:               lib/python2.7/dist-packages

  Python 3:
    Interpreter:                 /usr/bin/python3 (ver 3.5.2)
    Libraries:                   /usr/lib/aarch64-linux-gnu/ (ver 3.5.2)
    numpy:                       /usr/lib/python3/dist-packages/numpy/core/include (ver 1.11.0)
    packages path:               lib/python3.5/dist-packages

  Python (for build):            /usr/bin/python2.7

    ant:                         NO
    JNI:                         NO
    Java wrappers:               NO
    Java tests:                  NO

  Matlab:                        NO

  Install to:                    /usr/local

Cuda video reader requires nvcuvid library that is not available for TX1/TX2.

You may use V4L or gstreamer and CPU VideoCapture for accessing your frames from opencv.

You may read this article from eCon-systems and check their helper lib for using V4L2 userptr method.

thanks for your reply.

However, I cannot use V4L because the source is a RTSP server or a file in the worst case.

So I guess, I will have to go with GStreamer. Any example to have an accelerated pipeline in c++ for jetson ?

thanks in advance.

You may try something like:

#include <iostream>
#include <opencv2/core.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/videoio.hpp>

int main(void)
 	const char *gst =   "rtspsrc location=rtsp:// ! application/x-rtp, media=(string)video "
			    "! decodebin    ! video/x-raw, format=(string)NV12 "
			    "! videoconvert ! video/x-raw, format=(string)BGR "
			    "! appsink";

        cv::VideoCapture cap(gst);
        if( !cap.isOpened() )
            std::cout << "Error: failed" << std::endl;
            return 0;
		std::cout << "Capture opened" << std::endl;

    	unsigned int width = cap.get(CV_CAP_PROP_FRAME_WIDTH); 
    	unsigned int height = cap.get(CV_CAP_PROP_FRAME_HEIGHT); 
    	unsigned int pixels = width*height;
    	std::cout <<"Frame size : "<<width<<" x "<<height<<", "<<pixels<<" Pixels "<<std::endl;

    	cv::namedWindow("RTSP_Preview", CV_WINDOW_AUTOSIZE);
    	cv::Mat frame_in(width, height, CV_8UC3);

    		if (! {
			std::cout<<"Capture read error"<<std::endl;
		else {

        return 0;

the things is that as I am not able to use the GPU, the reading of a video through opencv take quite some CPU (120%) and I am afraid that with all others opencv operations that I am planning it could be a bottle neck.

I have “enhanced” the version with gstreamer like that, in order to take in account hardware decoding :

"rtspsrc location=rtsp:// ! rtph265depay ! h265parse ! omxh265dec ! videoconvert ! appsink";


filesrc location=/home/nvidia/devel/devel/sv/videos/bbb_sunflower_1080p_30fps_normal.mp4 ! qtdemux name=demux demux.video_0 ! queue ! h264parse ! omxh264dec ! videoconvert ! appsink

the problem is that the GPU is barely used because the read is not handled by the GPU.

Now, I am wondering if I should use the tegra multimedia API in order to decode the stream and pass it to opencv ? or maybe use VisionWork ?

thanks in advance.