small array (length 512) reduction inside loop

Greetings. Let’s say I have a code like this

for (int i = 0; i < 100000; i++)
{
reduction(on an array size 512);
do something useful with the summation;
randomly change some values of the array;
}

Two cases here. 1) where change of array values/location of change at each iteration is unpredictable and large such that I have to pretty much redo the reduction from scratch at each iteration and 2) change of array values/location of change at each iteration is unpredictable and small (5-10 elements) that perhaps I can use a different data structure (e.g. Fenwick tree) to do reduction.

In case 1), I am currently utilizing shared memory (1d array) and doing reduction akin to kernel found in SDK. Any other suggestion (e.g. 2d array)?

In case 2), is Fenwick tree the best data structure to utilize?

Finally, how much speedup would you expect this routine to have over a CPU one assuming both are optimized well?

EDIT: using Fermi Tesla C2050

Thanks in advance!