I am trying to perform the operation that will be as follows on the CPU:
for (int i(0); i < ARRAY_SIZE; ++i)
sum += array[i];
What will be the best strategy to do that on the GPU using CUDA?
The reason I don’t want to do it on the CPU is to avoid memory transfer between the GPU to CPU (in practice, ‘array’ has been generated and is held on the GPU).