How to chose the number of blocks and threads in kernel calling

Hi people, I’m new of CUDA programming and, therefore, I want to know a trivial thing: with which logic have I to choose the number of blocks and the number of threads for block in the kernel calling? Is there a general formula to follow or you have advices to give me? Thanks a lot!

First of all see CUDA references and guides provided with the CUDA toolkit instalation :)

The max number of threads that can be used depends on Your architecture (number of multiprocessors, number of threads per block etc.). Setting up number of threads that exceeds the max number can lead to a performance drop.

Next thing is threads utilization - making all threads You decided to be used being actually used and active. It is not that simple, depending on the problem You are solving using CUDA.



Except for the hard limitations of the your device you will have to play a little find out which combinations gives the best results.

Optimum number of threads per block depends on the application so try to make code so it is easy to change the number of threads per block, and as pasoleatis said play a little. Somewhere in range 128 to 256 often a good starting point.

Enjoy !