Hi,
I'm trying to run a CUDA program on a tesla C1060 card. I'm getting an error "Invalid Configuration Argument" if I use a 2D thread block for the kernel. To make sure it has nothing to do with my code, I changed the template.cu in template project which comes with the SDK to verify this. It works with the following configuration
[codebox] // setup execution parameters
dim3 grid( 1, 1, 1);
dim3 threads( num_threads, 1, 1);
// execute the kernel
testKernel<<< grid, threads, mem_size >>>( d_idata, d_odata);
[/codebox]
While if I use a 2D thread block like this (Note: num_threads = 32)
[codebox] // setup execution parameters
dim3 grid( 1, 1, 1);
dim3 threads( num_threads, 32, 1);
// execute the kernel
testKernel<<< grid, threads, mem_size >>>( d_idata, d_odata);
[/codebox]
it gives me an error
cutilCheckMsg() CUTIL CUDA error: Kernel execution failed in file <template.cu>, line 117 : invalid configuration argument.
Does it have something to do with the way CUDA is set up or am I doing something wrong here?
Thanks
Shibdas