How to chose the number of blocks and threads in kernel calling

local_hero · November 25, 2011, 10:35am

Hi people, I’m new of CUDA programming and, therefore, I want to know a trivial thing: with which logic have I to choose the number of blocks and the number of threads for block in the kernel calling? Is there a general formula to follow or you have advices to give me? Thanks a lot!

cmaster.matso · November 25, 2011, 11:00am

First of all see CUDA references and guides provided with the CUDA toolkit instalation :)

The max number of threads that can be used depends on Your architecture (number of multiprocessors, number of threads per block etc.). Setting up number of threads that exceeds the max number can lead to a performance drop.

Next thing is threads utilization - making all threads You decided to be used being actually used and active. It is not that simple, depending on the problem You are solving using CUDA.

Regrads,

MK

pasoleatis · November 25, 2011, 11:25am

Except for the hard limitations of the your device you will have to play a little find out which combinations gives the best results.

kbam · November 27, 2011, 10:56pm

Optimum number of threads per block depends on the application so try to make code so it is easy to change the number of threads per block, and as pasoleatis said play a little. Somewhere in range 128 to 256 often a good starting point.

Enjoy !