Quick question: Is it possible to have 3D thread blocks in cuda, I assume the largest 3D thread block (with equal size dimensions) based on constraints of cuda will be 888=512, is that correct? Is it a good practice?
Also, could we have 3D arrays (i.e. a[y][z]) in the device code: global and shared memory? How is memory coalescing possible with this layout?
I get configuration error when I try to launch a kernel with thread block size 444 and grid size 888.
When I change the dimensions to 2-D block and grid, then it will be solved.
Does CUDA really support 3-D blocks and grids, because I’ve heard from some people that it is not a working feature …
Any comments?
Oh, you’ve just heard down the grapevine that it doesn’t work. Just read the manual or look at the output from the SDK deviceQuery, they are more reliable. There are clearly documented grid dimension limitations of 65535x65535x1 (i.e. grids are 2D only).