Greetings. Let’s say I have a code like this
for (int i = 0; i < 100000; i++)
{
reduction(on an array size 512);
do something useful with the summation;
randomly change some values of the array;
}
Two cases here. 1) where change of array values/location of change at each iteration is unpredictable and large such that I have to pretty much redo the reduction from scratch at each iteration and 2) change of array values/location of change at each iteration is unpredictable and small (5-10 elements) that perhaps I can use a different data structure (e.g. Fenwick tree) to do reduction.
In case 1), I am currently utilizing shared memory (1d array) and doing reduction akin to kernel found in SDK. Any other suggestion (e.g. 2d array)?
In case 2), is Fenwick tree the best data structure to utilize?
Finally, how much speedup would you expect this routine to have over a CPU one assuming both are optimized well?
EDIT: using Fermi Tesla C2050
Thanks in advance!