I want to sum all the elements in a large array. I tried Parallel Inclusive and Exclusive scan method, but it worked for only limited(small) array size. Can anybody please help me out in this. Solutions and suggestions would be appreciated.
Have you already looked at ‘scanLargeArray’ example in NVIDIA CUDA SDK ?
They have used a very good approach to tackle this.
You can try the code posted here:
It’s very simple code where essentially each warp reduces A LOT of elements and ending with a warp reduce.