What if number of threads is not divisible by block size

Balthazar1 · September 25, 2009, 9:27pm

Hi, I am wondering what is the best way for handling the situations where the number of threads is not divisible by block size? I know I can do the calculations for the extra threads on the CPU but I want to know if I can do everything on GPU. I was able to run all the threads on the GPU and in order to avoid error I put all my kernel function inside a if statement so it is executed only of index is less than number of threads.

__global__ void  kernel ( ...)

{

   int index = __mul24(blockIdx.x,blockDim.x) + threadIdx.x;

   if (index<numBodies){

	 (kernel funtion)

	  ...

   }

}

Is there any problem with this solution?

thanks

_Big_Mac · September 25, 2009, 9:45pm

It’s fine. You won’t get any performance penalty because all the working threads will return true on that condition and there will be no branching.

So yes, divide numBodies through block size, round the result up if the modulo is non-zero and you have your grid size. Then do this check in the kernel.

CaLu · September 25, 2009, 9:52pm

This seems to be a good solution!
In a problem like yours I needed 3 float values. So I’ve used fourth empty field of float4 type and fill it with a value. Then I’ve used an if statement to check the value when I read data.

cheers,
luca

Balthazar1 · September 25, 2009, 9:52pm

Thanks! External Image