operation that returns a single value only?

Hi there,

I am trying to perform the operation that will be as follows on the CPU:

int array[ARRAY_SIZE];
int sum(0);
for (int i(0); i < ARRAY_SIZE; ++i)
sum += array[i];

What will be the best strategy to do that on the GPU using CUDA?

The reason I don’t want to do it on the CPU is to avoid memory transfer between the GPU to CPU (in practice, ‘array’ has been generated and is held on the GPU).


You probably have to take a look at the CUDA Reduction example :-)

And not only the example, but read some articles about this ‘reduction algorithm’ as well: you’ll find it back in many forms in your own solutions later on. I think it is a very important algorithm to understand.