How to decide the optimal block size in CUDA

laughingrice · February 13, 2010, 6:49am

the limit is 512. Usually 128-256 is a good number and is dependent on the resources used (registers and shared memory).

You want as many threads as possible to be scheduled on each multi processor (more or less 192 is a good minimum) to be able to hide latency issues when accessing registers.

There is a limit on the number of active blocks per multi gpu so you want large enough blocks to fill that minimum. On the other hand you need enough resources so if the block uses a lot of registers or shared memory there won’t be enough resources to schedule the block.

You also want the block to be a multiple of 32 (warp size) or you’ll be wasting threads, and have if possible groups of 16 thread accessing consecutive memory for coalescing.

On the other hand you want enough blocks to schedule over all the multi processors to fully utilize the card so you don’t want the blocks too large if your problem is small

I find that for 2D problems, having blocks of 16x16 or 16x12 or at the worst 16x8 do a good job

Topic		Replies	Views
How to use "block" and "thread" CUDA Programming and Performance	5	1272	October 16, 2013
Block size and grid size CUDA Programming and Performance	5	8402	April 27, 2009
Thread Block Size what difference does it make? CUDA Programming and Performance	6	5443	June 3, 2008
Newbie: Maximum Block Size CUDA Programming and Performance	8	3408	July 22, 2008
How to determine the Block Size CUDA Programming and Performance	1	5931	September 4, 2009
question about setting block_size in matrixMul CUDA Programming and Performance	1	1105	September 5, 2008
Help with block size and block numbers CUDA Programming and Performance	3	1793	November 26, 2009
Doubt regarding grid size and block size CUDA Programming and Performance	7	7230	October 17, 2010
optimal block size CUDA Programming and Performance	1	4310	July 12, 2008
General Formula for Thread/Block Ratio CUDA Programming and Performance	1	616	June 2, 2011

How to decide the optimal block size in CUDA

Related topics