butterfly summation

I just now installed CUDA and I am trying to get a hang of the programming. I started by trying out the butterfly summation. Normally in c++ when summing numbers you would do it sequentially. but with cuda to save time I can sum pairs of numbers then sum those pairs and so on until I have the final sum. For example say I have 128 numbers to sum, corresponding to 64 pairs. I use 64 treads to sum each pair, which leaves me with 64 numbers to sum (32 pairs). Then I use 32 threads to sum those pairs, which leaves me with 32 numbers to sum (16 pairs), and so on until there are only two numbers left to sum, with their sum being the final answer. I understand the concept but I don’t understand it syntax wise. After I fill the array with the numbers how would that look like to split the pairs into each thread?

I would merely call it (array) ‘accessing’ rather than (array) ‘splitting’

And you make array accessing conditional, based on each thread’s ID (threadIdx.x)

So, to sum pairs, you would tell all threads with odd IDs to sum their (ID) corresponding array element with their neighbour even ID thread’s corresponding array element, for example

make sure you grasp scanning, as used in the context of gpu programming

It sounds like you are doing a reduction. There is a “CUDA Parallel Reduction” example among the sample apps installed with CUDA that it might be helpful to look at. For an overview of available CUDA sample apps, take a look at http://docs.nvidia.com/cuda/cuda-samples