Invalid configuraion argument

Hello. I’m a CUDA newbee.
Nowadays, I’m implemening a CUDA program.
I’ve made the grid and threads of the program’s kernel like below.

dim3 threads( T2, 1);
dim3 grid( 1, T1);
testKernel<<< grid, threads >>>(d_odata, d_v0, d_v1, d_v2, d_u0, d_u1, d_u2);

Here, T1 is 264 and T2 is 528.
I’ve designed T2 width and 1 height block,
and 1 width and T1 height grid.
According to my design, T2 threads are in a block by widthwise,
and T1 blocks are in a grid by lengthwise.
But when I compiled and executed it, the program don’t work rightly and give a
message, “Invalid configuraion argument”.
It seems that there is a problem in threads and grid.
Are the designs of my grid and threads are wrong?
Are they too big?
I also have read the CUDA manual.
But I couldn’t fine any restriction about the size of grid or threads.
Please tell me what is the problem.



the maximum number of threads per block is 512, as stated in the CUDA Programming guide, section 5.1.


Ah, Really?
Thanks very much!
Well, I should fix my programme using new design of kernel.