Question about Reduction

The length of elements should be 2^n

What if the length is odd or other number such as 15,60,etc.

For example:
float array[100] = {1,2,3,4,5…100};
I’d like to get the sum of this array.
But the available SDK example just provide the situation when the the length of array is 2^n, not support odd.

Maybe you can give some more information about what you are trying to achieve here?

I wondered is there any efficient and best kernel-way to get the sum of the array above?


often you just add additional zeros, so if your array is of 100 elements, add 128 elements. where 28 elements would be zero. In some cases it’s faster to add 128 numbers than to add 100 elements with a lot of additional code that complicates parallel algorithm ;-)

Another option is to add the data size as a parameter to the kernel, then add an if statement inside the kernel so to only read from values inside the array. For example, in the first reduce kernel from the SDK:

__global__ void

reduce0(int *g_idata, int *g_odata, const unsigned int size)


    extern __shared__ int sdata[];

   // load shared mem

    unsigned int tid = threadIdx.x;

    unsigned int i = blockIdx.x*blockDim.x + threadIdx.x;

    sdata[tid] = (i < size) ? g_idata[i] : 0;                         // <---this line changed


   // do reduction in shared mem

    for(unsigned int s=1; s < blockDim.x; s *= 2) {

        // modulo arithmetic is slow!

        if ((tid % (2*s)) == 0) {

            sdata[tid] += sdata[tid + s];




   // write result for this block to global mem

    if ((i < size) && (tid == 0)) g_odata[blockIdx.x] = sdata[0];               // <---this line changed


You also need to change the code calling the kernel to launch enough blocks to cover the entire array.

(This is just to illustrate the basic idea - I haven’t compiled or tested this, but I use a similar method in parts of my code.)

Thank you for your excellent answers supposed to be helpful.

I will have a try.