How can I calculate blocks per grid?

user366312 · April 3, 2023, 2:30am

Suppose I have a GPU that allows the MAX_THREAD number of threads per block.

Also, suppose it allows the MAX_BLOCK_DIM number of blocks per grid on each grid dimension of x, y, and z.

If MAX_THREAD = 1024, and if dim3 threads_per_block is set to [32, 8, 4], as 32*8*4=1024, how can I calculate each dimension of dim3 blocks_per_grid so that I can launch a kernel like the following?

my_kernel<<<blocks_per_grid, threads_per_block>>>(... ... ...);

For example,

dim3 threads_per_block(x, y, z);
dim3 blocks_per_grid(xx, yy, zz);

Can I calculate the values of xx, yy, and zz from x, y, and z, respectively?

If not, what is the proper way to do this?

Robert_Crovella · April 3, 2023, 3:26am

Typically you would compute them as follows:

int dimx = ...;
int dimy = ...;
int dimz = ...;

dim3 block(32, 8, 4);
dim3 grid((dimx+block.x-1)/block.x, (dimy+block.y-1)/block.y, (dimz+block.z-1)/block.z);

And this assumes:

The dimx, dimy, and dimz may not be whole-number divisible by block.x, block.y, and block.z respectively. Therefore it is assumed that you want to launch a grid of blocks that is large enough to cover your dimensions.

In your kernel you have an appropriate thread-check such as:

__global__ void k(..., int dimx, int dimy, int dimz){
  int idx = threadIdx.x+blockDim.x*blockIdx.x;
  int idy = threadIdx.y+blockDim.y*blockIdx.y;
  int idz = threadIdx.z+blockDim.z*blockIdx.z;
  ...
  if (idx < dimx && idy < dimy && idz < dimz){  //thread-check
    // body of kernel code
    }

The (a+b-1)/b is a general formula for integer round-up division of a/b. Just work through examples until you understand it. Remember that division of positive integers in C++ normally truncates. This formula expects this kind of truncation but yields the next integer greater-than or equal-to the actual value of a/b

this online course covers these and other CUDA basics.

user366312 · April 9, 2023, 9:56pm

dimx, dimy, dimz seem to be the dimensions of the data structure.

What if my data structure is in 2D, but I want to use all the dimensions in the grid?

I.e., say, I want to run a matrix multiplication using all three grid dimensions.

Robert_Crovella · April 10, 2023, 1:49pm

Topic		Replies	Views
grid dimension and block dimension CUDA Programming and Performance	2	759	August 28, 2023
"3D" grids Is there a standard method? CUDA Programming and Performance	6	10353	September 30, 2008
Problem about Grid-Block-Thread Dimension CUDA Programming and Performance	3	6425	July 23, 2021
Question about dimGrid CUDA Programming and Performance	1	887	August 4, 2010
Give me the formula for calculating the blocks per grid in case of tiling CUDA Programming and Performance	2	315	August 9, 2023
Dimensions of a Block and a Grid CUDA Programming and Performance	7	13147	May 1, 2008
Max Dimension of GridSize and BlockSize CUDA Programming and Performance	8	10350	June 19, 2011
Questions about Block and Grid CUDA Programming and Performance	4	3619	February 26, 2008
Question about Block and Thread Organization dimBlock.x, dimBlock.y, dimGrid, dimBlock CUDA Programming and Performance	9	14744	April 22, 2012
Maximum block per grid CUDA Programming and Performance cuda	4	4499	March 24, 2023

How can I calculate blocks per grid?

Related topics