I am currently trying a computation using a 2D Grid and 3D Blocks but the index used for the memory copy from global to shared memories and registers to global memorie does not allow me to use more than one block in the z direction. (The program works perfectly for 2D grid and 2D blocks, and for 2D grid and 3D blocks [NBLOCK_SIZE_X][NBLOCK_SIZE_Y][1*BLOCK_SIZE_Z])
[codebox]// Setup execution parameters
dim3 dimBlock(BLOCK_SIZE_X, BLOCK_SIZE_Y, BLOCK_SIZE_Z);
dim3 dimGrid((X/dimBlock.x), ((Y/dimBlock.y) * (Z/dimBlock.z)));
// Save result in global memory
g_B [((y * x * threadIdx.z) + (x * (blockDim.y * blockIdx.y + threadIdx.y)) + ((blockDim.x * blockIdx.x + threadIdx.x)))] = Sum;[/codebox]
One Possible index for a 2D Grid and 3D Blocks is define in this topic :
But the blocks are not aligned properly.
Any help or even comment is welcome, thanks.