With the 2D limitation for grid layout, is there a standard method people have come up with for tiling 3D grids?
I’ve been playing around with grids of dimension [x y*z] and decomposing that second linear index, but I’ve run into trouble with dimensions of sizes that don’t divide cleanly, e.g. [32 32 17] over a [4 4 4] thread block or [32 17 23] over [4 4 4]. For dimensions of such arbitrary size, people typically calculate the grid using some sort of “divup” function: divide by the thread block size and add one more grid block if there was a remainder.
int divup(int a, int b)
if (a % b) /* does a divide b leaving a remainder? */
return a / b + 1; /* add in additional block */
return a / b; /* divides cleanly */
Yeah, try to use just bitwise-AND. Pad your grid’s y dimension to a Po2. It should be little overhead to start a pad block and immediately return out of each of its threads.
EDIT: obviously, this makes sense if you’re going to be calculating your dimensions frequently. You might do this to save a few registers or because you have many device functions and don’t want to pass around parameters needlessly. Then again, you can still just store the block’s coordinates in smem and then offset by threadIdx on the spot whenever you need to. Yeah, that’d be best.