Efficient way to split an array in device into two?

Hi all,

I am looking for an efficient way to sum up all the elements in an array, which is basically a reduction function. I am kind of thinking to split the array into two arrays with equal lengths and then sum them up in parallel and call this function recursively. However, what is a good way to split the array in device into two? Or, is there any alternative way to do the sum up?




Download the CUDA SDK. Look at the reduction sample. The whitepaper that comes with it also contains a lot of information.

Or you could use the thrust library.