blocksize not a multiple of num_elements

What if:

N % blocksize != 0

If the blocksize isn’t a multiple of the number of elements in the matrix or the blocksize is larger than the number of elements in the matrix the kernel will fail. I know why they fail but i don’t know how to find a solution for this problem. The second one won’t occur as much as the first one so that won’t be a problem in my case. but the first one will. because my dataset isn’t an even number all the time. I have to look for a GCD and that GCD needs to be smaller than 22 otherwise I need to take 1 as blocksize.

hi,

you might determine the number of blocks which do fit into the matrix and determine the left elements and spend extra blocks for them, so that the number of blocks is equal to
gridDim.x = ceilf(NumEl/blockDim.x);
and
gridDim.y = ceilf(NumEl/blockDim.y);

If you know how many elements those blocks have you might put a constraint depending on the threadID, so that the gpu does not execute threads which are not accessing the correct matrix elements.

hope this helps
greetz,cem

Hi Cem,

Thank you for your response, but what you are doing is just calculating how many blocks I will have to use. but what will happen with all the other elements. lets say I have a N = 1039 and blockDim.x = 16

then I have 15 elements which I have to throw away or what do you mean with that constraint you are talking about. It is not an option to just throw some elements away. And I never know how big my array is until I went through it.

Or I have to calculate the size of it but that I don’t want to do because it will take to much time.

There isn’t a better solution. You have to throw some elements away. With some well placed “if (threadIdx.x > some condition) do nothing” type statements you can make a kernel support any size dataset without any problems.

It is much better to do this than to try and adjust the block size with gcds. Kernel performance can be highly dependent on the block size. Just fix the block size and handle the edge condition in the kernel.