Computing an array average...?

There’s something very basic with CUDA that I fail to understand;
I need to compute an average for a given (large) array using CUDA.
The naive way is to sum all the members of the array (sum = sum+arr[i]) and to then divide by the length of the array.
If I do that on the GPU then sum is a pointer in memory that is being read and being written to by all the threads! This also provides the wrong result, of course.

How can such a simple thing be achieved? :wacko:


Look at the scan example in the SDK. Scan is a way to perform an associative binary operation over all of the elements of a list in parallel, so you can use it to compute the sum and then just perform a division at the end.

This can be done by using the scan algorithm.

Thank you, I will look into it…