Hello. I’m a CUDA newbee.
Nowadays, I’m implemening a CUDA program.
I’ve made the grid and threads of the program’s kernel like below.
dim3 threads( T2, 1);
dim3 grid( 1, T1);
testKernel<<< grid, threads >>>(d_odata, d_v0, d_v1, d_v2, d_u0, d_u1, d_u2);
Here, T1 is 264 and T2 is 528.
I’ve designed T2 width and 1 height block,
and 1 width and T1 height grid.
According to my design, T2 threads are in a block by widthwise,
and T1 blocks are in a grid by lengthwise.
But when I compiled and executed it, the program don’t work rightly and give a
message, “Invalid configuraion argument”.
It seems that there is a problem in threads and grid.
Are the designs of my grid and threads are wrong?
Are they too big?
I also have read the CUDA manual.
But I couldn’t fine any restriction about the size of grid or threads.
Please tell me what is the problem.