I want to use a 3 dimensional threads in one of my image processing application.
The output image is 512X512X128.
Here is what I am trying to do.
blocks.x = 8, blocks.y = 8, blocks.z = 8
grids.x = 512/8, grids.y = 512/8, grids.z = 128/8.
The kernel in this case times out as the total time taken by it it more than few seconds.
But when I comment grids.z, it takes the default value (1) and kernel runs fine.
To my surprise, the output is also correct.
Please let me know if my understanding of 3-D grids is fine or I am missing something.
I am using a GTX660 (Kepler 3.0 ) and CUDA 5.0.