unspecified launch failure in function caller using OpenCV for Tegra 2.4.13

Hi everyone,

running this code:

gpu::GpuMat cv_imgLeft_gpu;
gpu::GpuMat cv_imgLeft_gpu_blur;
Size filter_size;
filter_size.width = 7;
filter_size.height = 7;

// create filter
cv::Ptr<gpu::FilterEngine_GPU> filter = gpu::createGaussianFilter_GPU(CV_16SC1, filter_size, gauss_sigma);

while() {
// get image, function is from camera supplier, image is stored in pointer ptrGrabResultLeft
CameraLeft.RetrieveResult(5000, ptrGrabResultLeft, TimeoutHandling_ThrowException);

// constructor for GpuMatrix headers pointing to user-allocated data, load image to GPU memory
cv_imgLeft_gpu = gpu::GpuMat(ptrGrabResultLeft -> GetHeight(), ptrGrabResultLeft -> GetWidth(), CV_16SC1,  

// apply filter
filter->apply(cv_imgLeft_gpu, cv_imgLeft_gpu_blur, cv::Rect(0, 0, cv_imgLeft_gpu.cols, cv_imgLeft_gpu.rows));

produces this error output in my terminal:

OpenCV Error: Gpu API call (unspecified launch failure) in caller, 
file /hdd/buildbot/slave_jetson_tx_4/35-O4T-L4T-R24-armhf/opencv/modules/gpu/src/cuda/row_filter.h, 
line 174 terminate called after throwing an instance of 'cv::Exception' what(): 
error: (-217) unspecified launch failure in function caller

I am using an TX1 with OpenCV for Tegra 2.4.13. The camera takes 12 bit images within a 16 bit mask. However, if I load the image fo cv Mat format and convert it to gpu Mat, it works fine. But the time it takes to convert the image is to long, I want to avoid that. What am I missing?

Thanks for your help.

Best ManuKlause

Hi ManuKlause,

When create the cv::Mat first, then the GpuMat from it, things work. This tells that you have host memory from your camera input, so it must be explicitly uploaded to GPU memory before use. You can do some hiding of this latency if you use streams with asynchronous copy. Here is a nice presentation form GTC 2013 they should review for more info:


Hope this could help on your case.


Hi kayccc, you were right. I have contacted the supplier of the camera. It seems, the cameras is not able to write the image to GPU memory directly. So, I have to write it to the host first, then upload it. I will try using asynchronous data now. Tank you.

By using asynchronous data, I am using the overlap mode, am I right?

Hi ManuKlause,

Yes, by using asynchronous transfer as outlined in the section of the presentation called “Overlapping operations with CUDA” you will be able to minimize the total latency of your application.