Automate number of blocks and threads for block

local_hero · December 16, 2011, 11:32am

Hi people, my kernel function must sum N element of a vector, one sum for each thread. Then, if I have N values, I must have N thread. How can I automate the number of blocks and threads for block in the kernel call? I had made:

if(coefBin<512){

                numBlocks.x=1;

                numThreads.x=coefBin;

        }else if( (coefBin/512)<=65535 ){

                numBlocks.x=(coefBin/512)+1;

                numThreads.x=32;

                numThreads.y=16;

        }else{

                numBlocks.x=65535;

                numBlocks.y=( (coefBin/512)/65535 )+1;

                numThreads.x=32;

                numThreads.y=16;

        }

and call: kernel<<<numBlocks,numThreads>>>(…); but doesn’t work. Can you help me? Thanks!

local_hero · December 16, 2011, 12:02pm

I have observe that the function doesn’t work when gridDim.x = 65535 and gridDim.y = 2 or 3 or 4 (for small numbers). Example:

valuesToBeAdded=49995000
DimBlocks=131070, DimThreadsForBlock=512

gridDim.x=65535, gridDim.y=2
blockDim.x=32, blockDim.y=16

(655352)(32*16)=67107840 threads > 49995000

Print results:

QVect_Dev_Ris[0].x=1.000000
…
QVect_Dev_Ris[24997500].x=11413.000000
…
QVect_Dev_Ris[49994999].x=-1998397155538108416.000000 ERROR!!!

Why?

pasoleatis · December 16, 2011, 1:42pm

If you define a vector vector[N] you probably try to acess vector[i] with i <0 or >N-1.
You should use cuda-memcheck to see if your program is trying to access memory outside of the buonds of the arrays inside your kernel. If you are on linux add -g -G to the compile command and then run “cuda-memcheck ./your_program”.

local_hero · December 16, 2011, 1:51pm

Then:

========= CUDA-MEMCHECK

dimN=49995000

NumBlocks=131070, NumThreads=512

NumBlocks.x=65535, NumBlocks.y=2

NumThreads.x=32, NumThreads.y=16

QVect_Dev_Ris[0].x=1.000000

QVect_Dev_Ris[24997500].x=11413.000000

QVect_Dev_Ris[49994999].x=-1998397155538108416.000000

========= ERROR SUMMARY: 0 errors

This is the output of cuda memcheck. Means that do not go out?

pasoleatis · December 16, 2011, 3:15pm

Yes. It appears that the accesses are ok, so you are just missing something. It is possible that you are never calculating the value QVect_Dev_Ris[49994999].x. In this case you get whatever it is there. You can check this by first initializing all array QVect_Dev_Ris with some number and then see if you get at the end that number.

local_hero · December 16, 2011, 6:54pm

But before the code worked. It is when I introduced the optimization on number of blocks and threads that don’t work. If, for example, I put a big number of threads, the code work. Example:

kernel<<<dim3(65535,2000,1),dim3(32,16,1)>>>(…); works!

kernel<<<dim3(65535,4,1),dim3(32,16,1)>>>(…); doesn’t work! However, it should work because the number of threads is major of adds that I must do.

I don’t know what do External Image

local_hero · December 17, 2011, 4:52pm

No idea pasoleatis?

Topic		Replies	Views
Understanding Threads in CUDA help me find the exact number of threads for my code CUDA Programming and Performance	4	2416	July 13, 2009
Threads and blocks concept question Invoking a kernel CUDA Programming and Performance	2	1709	December 5, 2007
Run a million threads or blocks on a single kernel function, and still works. It supposed to be 512 at maximum, isn't it? CUDA Programming and Performance	4	1384	January 6, 2017
Number of blocks and threads CUDA Programming and Performance	1	1042	November 30, 2011
Correct setting of kernel parameters - problem with number of blocks CUDA Programming and Performance	2	842	August 8, 2013
understading Number of blocks and threads CUDA Programming and Performance	5	1706	April 23, 2010
Number of threads in kernel doesn't work as expected strange behavior CUDA Programming and Performance	1	819	July 2, 2010
Need help understanding kernel function, grid and block CUDA Programming and Performance	6	639	October 12, 2021
compare 2 array - stack CUDA Programming and Performance	6	1482	June 9, 2016
Limitation of blocks and threads CUDA Programming and Performance	0	2136	March 30, 2012

Automate number of blocks and threads for block

Related topics