"Invalid COnfiguration Argument" with 2D thread block

Hi,

I'm trying to run a CUDA program on a tesla C1060 card. I'm getting an error "Invalid Configuration Argument" if I use a 2D thread block for the kernel. To make sure it has nothing to do with my code, I changed the template.cu in template project which comes with the SDK to verify this. It works with the following configuration 

[codebox] // setup execution parameters

dim3  grid( 1, 1, 1);

dim3  threads( num_threads, 1, 1);

// execute the kernel

testKernel<<< grid, threads, mem_size >>>( d_idata, d_odata);

[/codebox]

While if I use a 2D thread block like this (Note: num_threads = 32)

[codebox] // setup execution parameters

dim3  grid( 1, 1, 1);

dim3  threads( num_threads, 32, 1);

// execute the kernel

testKernel<<< grid, threads, mem_size >>>( d_idata, d_odata);

[/codebox]

it gives me an error

cutilCheckMsg() CUTIL CUDA error: Kernel execution failed in file <template.cu>, line 117 : invalid configuration argument.

Does it have something to do with the way CUDA is set up or am I doing something wrong here?

Thanks

Shibdas

Hi,

num_threads * 32 may be too large, i.e. you may be requesting too many threads per block. How big is num_threads?

num_threads is 32 only.

I got it… I was confused by the maximum dimensions in CUDA programming guide. Max. number of threads possible is 512 and that’s why 32*32 threads generates an error.