We all know that we can create upto 2-Dimensional Grids and Upto 3-Dimensional Thread Blocks.
But why is the dimensionality of Grids and Thread Blocks restricted?
Is there any hardware limitation ?
Is it a architectural decision ?
Or any other reason ?
The 2D grids are a hardware restriction (as is the 65535 size limit). The Fermi architecture actually supports 3D grids, but this isn’t exposed in CUDA yet.
As Sarnath says, you can easily compute your own n-D indices from blockIdx and threadIdx if you want to.