There’s something very basic with CUDA that I fail to understand;
I need to compute an average for a given (large) array using CUDA.
The naive way is to sum all the members of the array (sum = sum+arr[i]) and to then divide by the length of the array.
If I do that on the GPU then sum is a pointer in memory that is being read and being written to by all the threads! This also provides the wrong result, of course.
How can such a simple thing be achieved? :wacko: