maximum number of blocks

hieronymus · April 10, 2008, 8:46am

I have 64 threads for a block. More threads per block does not make sense. I use havily __syncthreads. My kernel uses about 16 registers. I can use either 2000 bytes of shared memory or the full 16kb ( which gives faster performance).

Can somebody tell what is the maximum amount of blocks I can invoke? Is it limited by the total nr of registers? That would mean 8192/16 = 512 ??

seibert · April 10, 2008, 12:19pm

The maximum number of blocks is 65535 in each dimension (pg 74 of programming guide). The CUDA driver will run as many of them simultaneously as possible, and that depends on register and shared memory usage. So not all blocks are guaranteed to be running at the same time. Some will be queued waiting for other blocks to finish.

hieronymus · April 10, 2008, 12:55pm

I understand that the number of registers is of influence. But the shared memory? Does it mean it will swap shared memory. Or will be some of the shared memory left unused . Remember I use 64 threads each block.

MisterAnderson42 · April 10, 2008, 2:00pm

Only the the occupancy depends amount of registers, block size, and shared memory. None of these influence the total number of blocks you can execute, which is 65535*65535.

Shared memory is not swapped. Each block gets its own exclusive section of shared memory from the start of the block’s execution to the end. In your 2000 bytes of shared mem usage, there may be some left unused: use the occupancy calculator spreasheet to determine this.

Topic		Replies	Views
how to determine max number of blocks per kernel CUDA Programming and Performance	10	17272	September 11, 2011
Maximal threads per block calculation Calc based in reg and shared mem usage.. CUDA Programming and Performance	7	5018	June 30, 2008
max number of block CUDA Programming and Performance	21	17880	April 20, 2010
regsPerBlock CUDA Programming and Performance	4	2489	September 28, 2008
Relation between # of blocks and devicememory size questions about blocks and memory CUDA Programming and Performance	3	1796	July 23, 2008
shared memory allocation among thread blocks CUDA Programming and Performance	3	1865	March 3, 2008
Shared memory and register usage - just 1 thread/block CUDA Programming and Performance	1	810	July 21, 2009
number of threads and registers CUDA Programming and Performance	10	4911	March 14, 2008
Shared Memory and number of Blocks invoked CUDA Programming and Performance	4	5757	March 5, 2008
Registers per thread limit and occupancy CUDA Programming and Performance	3	10105	March 30, 2007

maximum number of blocks

Related topics