CUDA 6.5 with OpenCV4Tegra and 352.63 Driver

Hello,

I want use cuda 6.5 with opencv4tegra on my notebook with Ubuntu 14.04 and GT 540M Device.

For the installation i have used this guide
http://developer.download.nvidia.com/embedded/OpenCV/L4T_21.1/README.txt

I downloaded it from here

cuda-repo-ubuntu1404-6-5-prod_6.5-19_amd64.deb + libopencv4tegra-repo_ubuntu1404_2.4.10.1_amd64.deb

But I do not use the 340.29 driver, which is included in cuda 6.5, because this causes a black screen after installation. I installed 352.63 driver, this is included in CUDA 7.5.

The CUDA samples can be compiled and run successfully.

External Media
External Media

deviceQuery shows:

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GT 540M"
  CUDA Driver Version / Runtime Version          7.5 / 6.5
  CUDA Capability Major/Minor version number:    2.1
  Total amount of global memory:                 2048 MBytes (2147155968 bytes)
  ( 2) Multiprocessors, ( 48) CUDA Cores/MP:     96 CUDA Cores
  GPU Clock rate:                                1344 MHz (1.34 GHz)
  Memory Clock rate:                             900 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 131072 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65535), 3D=(2048, 2048, 2048)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (65535, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Bus ID / PCI location ID:           1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.5, CUDA Runtime Version = 6.5, NumDevs = 1, Device0 = GeForce GT 540M
Result = PASS

Next, I wanted to use opencv with this simple example:

#include <iostream>
#include "opencv2/opencv.hpp"
#include "opencv2/gpu/gpu.hpp"

int main (int argc, char* argv[])
{
        cv::Mat src_host = cv::imread("/home/obelov/file.png", CV_LOAD_IMAGE_GRAYSCALE);
        cv::gpu::GpuMat dst, src;
        src.upload(src_host);

        cv::gpu::threshold(src, dst, 128.0, 255.0, CV_THRESH_BINARY); // here comes the error
        cv::Mat result_host(dst);

        cv::imshow("Result", result_host);
        cv::waitKey();
    return 0;
}

this code is successfully compiled, but if you want to run this, comes an error

OpenCV Error: Gpu API call (invalid device function ) in call, file /hdd/buildbot/slaves/slave_ubuntu14/54-O4T-Ubuntu14/opencv/modules/gpu/include/opencv2/gpu/device/detail/transform_detail.hpp, line 361
terminate called after throwing an instance of 'cv::Exception'
  what():  /hdd/buildbot/slaves/slave_ubuntu14/54-O4T-Ubuntu14/opencv/modules/gpu/include/opencv2/gpu/device/detail/transform_detail.hpp:361: error: (-217) invalid device function  in function call

Could this error be due to the driver 352.63? Because CUDA 6.5 uses a 340.29?
CUDA Driver Version / Runtime Version 7.5 / 6.5
But, CUDA without using OpenCV works great!

I’m a little confused as to why you are using opencv4tegra in a non-tegra environment.

The error you are getting:

invalid device function  in function call

Is indicative of trying to run a GPU operation using code that was not compiled for the GPU you are running on.

Tegra devices are cc3.0 or cc5.3 devices. Your GT540M is a cc2.1 device. So if your opencv4tegra libraries were compiled for cc3.0 and/or cc5.3, they would not run properly on your cc2.1 device, and the error you are getting (“invalid device function”) is exactly the error I would expect in such a scenario.

You should probably learn more about OpenCV and learn how to install it and/or build it appropriately for your platform.

The only cause for “invalid device function” that I am aware of is that the device code was compiled for a GPU architecture that is incompatible with the architecture of the GPU actually in the system. So check the OpenCV build settings.

What immediately raises a red flag in my mind is that you use opencv4tegra. Tegra is a newer architecture than sm_21 (I don’t recall which one off the top of my head), so given the name it probably has default build settings appropriate for Tegra platforms, which then generate device code that cannot execute on an sm_21 device.

Generally speaking, you can update drivers to later versions than the minimum specified for a particular CUDA version. Only drivers that are older than the specified version would be a problem. I currently use the latest WHQLed Windows driver 362.13 here, with CUDA 6.5 and CUDA 7.5.

oooh, I have not seen that OpenCV4Tegra 2.4.10.1 is not compatible with 2.1 device.
Thank you for your help.

good to know!
Thank you!