Question about grid/block/thread sizes

zarnick · November 13, 2012, 1:19pm

A very simple question I belive:
The output from the deviceQuery from the CUDA samples, shows me I have this max sizes:

Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Maximum sizes of each dimension of a block:    1024 x 1024 x 64
  Maximum sizes of each dimension of a grid:     65535 x 65535 x 65535

I know that I can “confortably” work with this configs:

dim3 blocks(65535);
  dim3 threads(1024);
  kernel();

However, If I’m guessing correctly, I cannot work with something like this:

dim3 blocks(65535,65535);
  dim3 threads(1024,1024);
  kernel();

Because I have a maximum of 1024 threads per block, and I’m actually requesting 1024 per block in each dimension (giving 1024x1024 max threads), is this correct?

Thank you.

pasoleatis · November 13, 2012, 2:37pm

zarnick:

A very simple question I belive:
The output from the deviceQuery from the CUDA samples, shows me I have this max sizes:
Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Maximum sizes of each dimension of a block:    1024 x 1024 x 64
  Maximum sizes of each dimension of a grid:     65535 x 65535 x 65535
I know that I can “confortably” work with this configs:
dim3 blocks(65535);
  dim3 threads(1024);
  kernel();
However, If I’m guessing correctly, I cannot work with something like this:
dim3 blocks(65535,65535);
  dim3 threads(1024,1024);
  kernel();
Because I have a maximum of 1024 threads per block, and I’m actually requesting 1024 per block in each dimension (giving 1024x1024 max threads), is this correct?

Thank you.

Hello,

It is not correct to submit 1024x1024. The total number of thread should 1024 totally, for the numbers of blocks is ok what you did. If you use dim3 threads(1024,1024) the kernels will not be executed. YOu could use for example dim3 threads(32,32).

If you have dim3 threads(tx,ty,tz) you have the following rules txtytz<=1024, ty<=1024,ty<=1024,tz<=64.

zarnick · November 13, 2012, 3:15pm

Thanks pasoleatis, that is exactly what I was thinking, too bad, it would really simplify my life if I could have that many threads, hehehe.

wanderine · November 13, 2012, 3:23pm

For the GTX 680, the max number of threads per thread block is 2048.

Topic		Replies	Views
Question regarding maximum amount of blocks CUDA Programming and Performance	2	796	January 28, 2011
Maximum block per grid CUDA Programming and Performance cuda	4	3560	March 24, 2023
Understanding deviceQuery CUDA Programming and Performance	2	4112	June 28, 2014
Maximum possible number of threads (Total) CUDA Programming and Performance	1	1009	December 28, 2009
Thread Number Limitation CUDA Programming and Performance	3	3890	December 22, 2008
Questions about Block and Grid CUDA Programming and Performance	4	3548	February 26, 2008
how many threads can used in one grid 5126553565535 CUDA Programming and Performance	1	1664	June 24, 2009
CUDA - thread block confusion concept clearity sought CUDA Programming and Performance	6	3001	November 10, 2011
maximum thread numbers CUDA Programming and Performance	5	12069	October 4, 2011
is there a limitation for total number of threads? CUDA Programming and Performance	5	5271	October 22, 2009

Question about grid/block/thread sizes

Related topics