Efficient way to split an array in device into two?

[font=“Arial”]Hi all,

I am looking for an efficient way to sum up all the elements in an array, which is basically a reduction function. I am kind of thinking to split the array into two arrays with equal lengths and then sum them up in parallel and call this function recursively. However, what is a good way to split the array in device into two? Or, is there any alternative way to do the sum up?

Thanks,

L [/font]

Download the CUDA SDK. Look at the reduction sample. The whitepaper that comes with it also contains a lot of information.

Or you could use the thrust library.

N.