[Newbie] Getting "unspecified launch failure" errors

Hello everyone,

I’m very new to CUDA programming, and I’m working my way through the Kirk & Hwu book, and one of the first things I’m trying to do is adding two matrices. This should be really easy, and I’m sure I’m missing something really obvious, but the error message I’m getting is thoroughly unhelpful. I’m running this on my laptop running Fedora 20 and a GTX 265M card.

The offending code is here:

HANDLE_ERROR( cudaMalloc( (void**)&dev_A, rows*cols*sizeof(int) ) );
HANDLE_ERROR( cudaMalloc( (void**)&dev_B, rows*cols*sizeof(int) ) );
HANDLE_ERROR( cudaMalloc( (void**)&dev_C, rows*cols*sizeof(int) ) );

HANDLE_ERROR( cudaMemcpy( dev_A, A, rows*cols*sizeof(int), cudaMemcpyHostToDevice ) );
HANDLE_ERROR( cudaMemcpy( dev_B, B, rows*cols*sizeof(int), cudaMemcpyHostToDevice ) );

dim3 block( 16, 16 );
dim3 grid( (rows + 15)/16, (cols + 15)/16 );

matrix_add<<< grid, block >>>( A, B, C, rows, cols );
HANDLE_ERROR( cudaMemcpy( C, dev_C, rows*cols*sizeof(int), cudaMemcpyDeviceToHost ) );

The last cudaMemcpy is where my code is erroring out. Again, I’m sure I’m missing something really easy. Thanks in advance!

Shouldn’t you be passing dev_A, dev_B, and dev_C to your kernel, instead of A,B,C ?

like this:

matrix_add<<< grid, block >>>( dev_A, dev_B, dev_C, rows, cols );

You are absolutely correct. As I said, I knew it was something extremely obvious, but I had stared at the problem too long to figure it out.