operation that returns a single value only?

Hi there,

I am trying to perform the operation that will be as follows on the CPU:

int array[ARRAY_SIZE];
int sum(0);
for (int i(0); i < ARRAY_SIZE; ++i)
sum += array[i];

What will be the best strategy to do that on the GPU using CUDA?

The reason I don’t want to do it on the CPU is to avoid memory transfer between the GPU to CPU (in practice, ‘array’ has been generated and is held on the GPU).


What you want to do is called a parallel reduction, and the CUDA SDK contains a very highly optimised reduction kernel and accompanying white paper which discusses how it works.

Cheers, that’s exactly what I was looking for.