Hello,
From what I have understood by playing around with CUDA, it seems that one can have 3-dimensional threadId’s defined below as
dim3 threadBlock(8, 8, 8);
A block dimensions for it can be defined as
dim3 kernelBlockGrid(1, 1, 1);
If I change the last dimension of kernelBlockGrid to anything other than 1, I get the following error:
cufft: ERROR: D:/Bld/rel/gpgpu/toolkit/r2.0/cufft/src/execute.cu, line 1038
cufft: ERROR: CUFFT_EXEC_FAILED
cufft: ERROR: D:/Bld/rel/gpgpu/toolkit/r2.0/cufft/src/execute.cu, line 297
cufft: ERROR: CUFFT_EXEC_FAILED
cufft: ERROR: D:/Bld/rel/gpgpu/toolkit/r2.0/cufft/src/cufft.cu, line 119
cufft: ERROR: CUFFT_EXEC_FAILED
As far as I understand, when I have something like
myKernel<<<kernelBlockGrid, threadBlock>>> …
in my program, kernelBlockGrid defines the dimensions of the Grid and threadBlock defines the layout of the threads in the block. If the maximum number of the threads per block is 512, how does having maximum block dimensions of 51251264 help? Maybe a scenario in which this will be helpful will help me understand it a bit better.
Thanks.
Regards,
-Alark