parallel reduction scope difference of two arrays


Another CUDA Newbie Question :
Can some one shed light ,if there is a scope of parallel reduction in the below mentioned scenario :

for (j=0;j<numiteration;j++)
if (supreme<fabs(d_b[j]-d_a[j]))

Can I safely try the typical parallel implementation (like the sum/max reduction), recursively halve threads, use shared memory to hold supreme ?

Any pointers to similar examples for reduction will be of great help ?

Thanks much.

The inner_product function in Thrust makes this easy to do:

[codebox]#include <thrust/inner_product.h>

#include <thrust/functional.h>

#include <thrust/device_vector.h>


// this example computes the maximum absolute difference

// between the elements of two vectors

struct abs_diff


template <typename T>

__host__ __device__

T operator()(const T x, const T y)


    return fabs(b - a);



int main(void)


thrust::device_vector<float> d_a(4);

thrust::device_vector<float> d_b(4);

d_a[0] = 1.0; d_b[0] = 2.0;

d_a[1] = 2.0;  d_b[1] = 4.0;

d_a[2] = 3.0;  d_b[2] = 3.0;

d_a[3] = 4.0;  d_b[3] = 0.0;

// initial value of the reduction

float init = 0;

// binary operations

thrust::maximum<float> binary_op1;

abs_diff               binary_op2;

float max_abs_diff = thrust::inner_product(d_a.begin(), d_a.end(), d_b.begin(), init, binary_op1, binary_op2);

std::cout << "maximum absolute difference: " << max_abs_diff << std::endl;

return 0;


This example uses Thrust’s device_vector container, but that’s not necessary. You can wrap a “raw” pointer with thrust::device_ptr as this example shows.

Thanks so much…made it so lucid !!