"3D" grids Is there a standard method?

James_Malcolm1 · September 29, 2008, 3:49pm

With the 2D limitation for grid layout, is there a standard method people have come up with for tiling 3D grids?

I’ve been playing around with grids of dimension [x y*z] and decomposing that second linear index, but I’ve run into trouble with dimensions of sizes that don’t divide cleanly, e.g. [32 32 17] over a [4 4 4] thread block or [32 17 23] over [4 4 4]. For dimensions of such arbitrary size, people typically calculate the grid using some sort of “divup” function: divide by the thread block size and add one more grid block if there was a remainder.

int divup(int a, int b)

{

    if (a % b)  /* does a divide b leaving a remainder? */

        return a / b + 1; /* add in additional block */

    else

        return a / b; /* divides cleanly */

}

Anyone have a standard method they use?

-James

StickGuy · September 29, 2008, 4:59pm

My grids are 64 - 256 in the x and y directions and arbitrary in z so I do this:

x = threadIdx.x;

y = blockIdx.x;

z = blockIdx.y;

James_Malcolm1 · September 29, 2008, 5:12pm

Here’s a solution that works. Any better techniques come to mind?

Here’s the kernel to write out each index:

static void __global__ ind2global(float *d_out, dim3 dims, int y_div)

{

    /* grid dimension is [w h*d] */

    int x = blockIdx.x * blockDim.x + threadIdx.x;

    int y = (blockIdx.y % y_div) * blockDim.y + threadIdx.y;

    int z = (blockIdx.y / y_div) * blockDim.z + threadIdx.z;

   if (x >= dims.x || y >= dims.y || z >= dims.z) return;

   int ind = z * dims.y * dims.x + x * dims.y + y;

    d_out[ind] = ind;

}

Here’s the general launch setup using the divup() function from above:

dim3 grid(divup(dims.x, BLOCK_LEN), divup(dims.y*dims.z,BLOCK_LEN*BLOCK_LEN));

dim3 threads(BLOCK_LEN,BLOCK_LEN,BLOCK_LEN);

int y_div = divup(dims.y, BLOCK_LEN);

ind2global<<<grid,threads>>>(d_out, dims, y_div);

James_Malcolm1 · September 29, 2008, 5:38pm

Correction, that code I posted does not work in the corner cases where the remainders come into play, e.g. [4 9 5] won’t index the last block. Correcting the y-grid computation fixed that:

dim3 grid(divup(dims.x, BLOCK_LEN), divup(dims.y,BLOCK_LEN)*divup(dims.z,BLOCK_LEN));

-jm

StickGuy · September 29, 2008, 5:44pm

You probably want to avoid the integer division/modulo operations if you can help it.

alex_dubinsky · September 30, 2008, 2:47am

Yeah, try to use just bitwise-AND. Pad your grid’s y dimension to a Po2. It should be little overhead to start a pad block and immediately return out of each of its threads.

EDIT: obviously, this makes sense if you’re going to be calculating your dimensions frequently. You might do this to save a few registers or because you have many device functions and don’t want to pass around parameters needlessly. Then again, you can still just store the block’s coordinates in smem and then offset by threadIdx on the spot whenever you need to. Yeah, that’d be best.

Reimar · September 30, 2008, 6:27am

int divup(int a, int b)

{

 Â  Â if (a % b) Â /* does a divide b leaving a remainder? */

 Â  Â  Â  Â return a / b + 1; /* add in additional block */

 Â  Â else

 Â  Â  Â  Â return a / b; /* divides cleanly */

}

[snapback]445613[/snapback]

Uh, I can’t imagine anyone actually uses that code, unless they want to avoid an integer overflow issue (which I doubt you will have).

The standard formula for this, at least for constant b, is “(a + b - 1) / b”.

If you have to use above code, “return a / b + !!(a % b)” will probably create better code with many compilers, though that may not be worth the obfuscation.