 # parallel reduction scope difference of two arrays

Hi,

Another CUDA Newbie Question :
Can some one shed light ,if there is a scope of parallel reduction in the below mentioned scenario :

for (j=0;j<numiteration;j++)
{
if (supreme<fabs(d_b[j]-d_a[j]))
supreme=fabs(d_b[j]-d_a[j]);
}

Can I safely try the typical parallel implementation (like the sum/max reduction), recursively halve threads, use shared memory to hold supreme ?

Any pointers to similar examples for reduction will be of great help ?

Thanks much.

The inner_product function in Thrust makes this easy to do:

[codebox]#include <thrust/inner_product.h>

#include <thrust/functional.h>

#include <thrust/device_vector.h>

#include

// this example computes the maximum absolute difference

// between the elements of two vectors

struct abs_diff

{

``````template <typename T>

__host__ __device__

T operator()(const T x, const T y)

{

return fabs(b - a);

}
``````

};

int main(void)

{

``````thrust::device_vector<float> d_a(4);

thrust::device_vector<float> d_b(4);
``````

d_a = 1.0; d_b = 2.0;

``````d_a = 2.0;  d_b = 4.0;

d_a = 3.0;  d_b = 3.0;

d_a = 4.0;  d_b = 0.0;
``````

// initial value of the reduction

``````float init = 0;
``````

// binary operations

``````thrust::maximum<float> binary_op1;

abs_diff               binary_op2;
``````

float max_abs_diff = thrust::inner_product(d_a.begin(), d_a.end(), d_b.begin(), init, binary_op1, binary_op2);

std::cout << "maximum absolute difference: " << max_abs_diff << std::endl;

``````return 0;
``````

}[/codebox]

This example uses Thrust’s device_vector container, but that’s not necessary. You can wrap a “raw” pointer with thrust::device_ptr as this example shows.

Thanks so much…made it so lucid !!