Questions about Block and Grid

Hi all,

I am confused about the allocation of blocks and grids. As I know, we could have 512 threads per block. Does this depends on GPU type? What is the maximum number of blocks we could use in a grid? And how many grids totally?

If I have 1000 threads to run, and I would like to assign a 1D block. Is this what I should do?

   int BLOCK_SIZE=512;

    int l = 1000;

    dim3 dimBlock(BLOCK_SIZE);

    dim3 dimGrid(l/BLOCK_SIZE);

I saw another post in this forum. One has 2048 threads to run. He said he would like to run all these in 1 grid, which has 4 blocks. The following is how he did:

   char host_buf[2048] = { '

char host_buf[2048] = { ’

   char host_buf[2048] = { '

char host_buf[2048] = { ’

   char host_buf[2048] = { '

char host_buf[2048] = { ’

   char host_buf[2048] = { '

char host_buf[2048] = { ‘\0’ };

dim3 dimBlock( sizeof(host_buf) );

dim3 dimGrid(1);

' };

    ......

    dim3 dimBlock( sizeof(host_buf) );

    dim3 dimGrid(1);

’ };

dim3 dimBlock( sizeof(host_buf) );

dim3 dimGrid(1);

' };

    ......

    dim3 dimBlock( sizeof(host_buf) );

    dim3 dimGrid(1);

’ };

dim3 dimBlock( sizeof(host_buf) );

dim3 dimGrid(1);

' };

    ......

    dim3 dimBlock( sizeof(host_buf) );

    dim3 dimGrid(1);

’ };

dim3 dimBlock( sizeof(host_buf) );

dim3 dimGrid(1);

' };

    ......

    dim3 dimBlock( sizeof(host_buf) );

    dim3 dimGrid(1);

I thought we can only use up to 512 as the parameter value of dimBlock.

Could someone explain this to me?

Appreciate.

You are correct, 512 is the max number of threads per block. But that maximum may be reduced based on register and shared memory usage (see the programming guide). 512 is not always the best block size, though. You should benchmark your code with block sizes ranging from 32 to 512 in multiples of 32 and choose the fastest for production runs.

Your definitions of dimBlock and dimGrid are fine, though you may want to take the ceiling of l/BLOCK_SIZE so that you don’t leave the last few threads off the end :)

The maximum number of blocks is practically unlimited at 65535x65535.

There is only 1 grid per kernel launch. If you launch a 2nd kernel, it will have its own block configuration and its own grid run sequentially after the first kernel.

Thank you, MisterAnderson42.

Could you please also tell me how many blocks a grid at most?

And, do you mean the the 2nd code is wrong? Will there be error, or only the first 512 threads run?

I already did in the previous post. This information is also clearly documented in the programming guide and available via running the device info program from the SDK.

Integer division rounds down, so 1000/512 is 1.

Thank you so much.