General Formula for Thread/Block Ratio

sepia.latimanus · June 2, 2011, 3:41pm

Hi all,

I’m fairly new to using CUDA and while a lot of it seems pretty straightforward (thank goodness for the people who made GPU parallelization this easy!) I feel that I’m in the dark about getting a healthy ratio between blocks and threads (or whether it even matters).

Here’s the general layout of my application-to-be:

Basically, it’s a 1-, 2-, or 3-D fluid dynamics grid solver. So at some point I want a thread to run for each cube in the grid. The order in which the grid cubes are calculated is not important. I’m starting with the 1-D case now which could have either 2 or >70,000 grid cubes, so I don’t want to set an absolute number for number of blocks in the kernel call. Would a reasonable technique be to use a series of IF statements to decide whether I’ll have one thread per block, 128 threads per block, or 512 depending on the total number of threads required?

For all I know this issue is more or less unimportant, but minimizing runtime is the objective, and I’m afraid that since I don’t have much formal CS training, the theory of optimizing hardware usage is somewhat opaque to me.

Thanks!

S

seibert · June 2, 2011, 4:48pm

One hardware-level guideline is to be sure your block size is a multiple of the warp size, which is 32 for all devices so far. The hardware executes instructions on entire warps, not threads, so if there are not enough threads to fill a warp, the CUDA cores will be idle part of the time and your effective throughput will be low.

You should, if possible, design your code so that you can benchmark it with block sizes ranging from 32 up to 512 threads per block, in multiples of 32. Many people find that the optimal number of threads per block is not what they would predict, as it can depend on subtle timing issues.

Generally speaking, 128 to 256 threads per block is a good starting place.

Topic		Replies	Views
Ideal number of thread per bloc CUDA Programming and Performance	9	3409	February 5, 2008
Blocks and Threads CUDA Programming and Performance	1	642	February 7, 2013
Number of thread blocks and threads in those, difference for performance? CUDA Programming and Performance	1	383	September 6, 2021
How to chose the number of blocks and threads in kernel calling CUDA Programming and Performance	3	665	November 27, 2011
How to decide the optimal block size in CUDA CUDA Programming and Performance	4	27735	February 15, 2010
finding the best number of threads per block CUDA Programming and Performance	3	7851	January 29, 2010
2 blocks versus 3 blocks CUDA Programming and Performance	5	4917	August 3, 2009
Is this a good match for GPU? CUDA Programming and Performance	5	3614	June 11, 2009
Here are my timing results, not impressive. Help. CUDA Programming and Performance	5	7010	January 30, 2008
Lots of Threads vs. Shared Memory CUDA Programming and Performance	9	8351	February 12, 2008

General Formula for Thread/Block Ratio

Related topics