blocksize not a multiple of num_elements

jordyvaneijk · October 22, 2007, 8:49am

What if:

N % blocksize != 0

If the blocksize isn’t a multiple of the number of elements in the matrix or the blocksize is larger than the number of elements in the matrix the kernel will fail. I know why they fail but i don’t know how to find a solution for this problem. The second one won’t occur as much as the first one so that won’t be a problem in my case. but the first one will. because my dataset isn’t an even number all the time. I have to look for a GCD and that GCD needs to be smaller than 22 otherwise I need to take 1 as blocksize.

sicb0161 · October 22, 2007, 9:58am

hi,

you might determine the number of blocks which do fit into the matrix and determine the left elements and spend extra blocks for them, so that the number of blocks is equal to
gridDim.x = ceilf(NumEl/blockDim.x);
and
gridDim.y = ceilf(NumEl/blockDim.y);

If you know how many elements those blocks have you might put a constraint depending on the threadID, so that the gpu does not execute threads which are not accessing the correct matrix elements.

hope this helps
greetz,cem

jordyvaneijk · October 22, 2007, 1:38pm

Hi Cem,

Thank you for your response, but what you are doing is just calculating how many blocks I will have to use. but what will happen with all the other elements. lets say I have a N = 1039 and blockDim.x = 16

then I have 15 elements which I have to throw away or what do you mean with that constraint you are talking about. It is not an option to just throw some elements away. And I never know how big my array is until I went through it.

Or I have to calculate the size of it but that I don’t want to do because it will take to much time.

MisterAnderson42 · October 22, 2007, 2:25pm

There isn’t a better solution. You have to throw some elements away. With some well placed “if (threadIdx.x > some condition) do nothing” type statements you can make a kernel support any size dataset without any problems.

It is much better to do this than to try and adjust the block size with gcds. Kernel performance can be highly dependent on the block size. Just fix the block size and handle the edge condition in the kernel.

Topic		Replies	Views
Help: blocksize of launch failure? CUDA Programming and Performance	2	2696	April 29, 2009
using vectors in GPU kernel CUDA Programming and Performance	3	2575	March 24, 2007
Thread Block Size CUDA Programming and Performance	1	912	September 17, 2009
thread / block allocation in function of data size CUDA Programming and Performance	5	4334	November 9, 2009
What if number of threads is not divisible by block size CUDA Programming and Performance	3	5168	September 25, 2009
Multiplying arbitrary sized matrices CUDA Programming and Performance	3	2129	February 2, 2010
Weird behavior of CUDA CUDA Programming and Performance	6	5625	February 13, 2008
Size limitation for 1D Arrays in CUDA? CUDA Programming and Performance	9	18492	October 17, 2013
blocksize causes kernel error CUDA Programming and Performance	4	3056	June 22, 2011
Block Sizes CUDA Programming and Performance	2	11096	February 21, 2007

blocksize not a multiple of num_elements

Related topics