I’m sure this has been discussed before, but I’m finding the search function of this forum to be somewhat lacking (and very frustrating).
I’ve reached a point where my kernel needs more than 65535 blocks, so I need to go to a 2-dimensional grid for the first time. The example in the programming guide is redundant due to the 16x16 grid size, and I’m not really interested in the “coordinates” of the block anyway. I’m really only interested in numbering the blocks sequentially, so that each block has a unique number. So, to obtain this unique number, I could use a formula such as:
Thre dimensional grid of blocks hasn’t been supported on CUDA (yet) so you have only those 6 combinations.
UniqueThreadIndex means unique per grid.
Offcourse you can calculate LocalThreadIndex (per block) for a three dimensional array of threads ie.
LocalThreadIndex = threadIdx.z * blockDim.y * blockDim.x + threadIdx.y * blockDim.x + threadIdx.x;
Although it is equivalent to the simpler version that omits the z dimension. But conceptually the blockDim.z * blockDim.y * blockDim.x is the number of threads per block. So the total is the UniqueBlockIndex * (number of threads per block) + (thread number within this block).