strange floating-point accuracy prolem

:wacko: Something quite strange appeared in a loop.

For example, when computing the sum of several global variables in array g_idata (assuming large enough); I first define a register variable temp = 0, then I do the following things(assuming it was done in thread 0):

for ( int i = 0; i < 16; i++)
{
temp = temp + g_idata[i];
}
Here, the global data is fetched from continuous address in g_idata, and the result is exactly the same with that from CPU.

However, when the indexs into g_idata are apart from each other, that is:

for ( int i = 0; i < 16; i++)
{
temp = temp + g_idata[i + stride];
}
where stride is an positive integer which is larger than 1, the resulting value in temp is slightly different from previous result.

I wonder are there any optimizations involved in the nvcc compiler which treats the 2 situations above respectively?
Then how to avoid such optimization since i would want the exact result.

Note that floating-point addition is no associative. Only if the exact same values are summed in the exact same order will you get identical results from these two loops. Feel free to post a small self-contained example if that does not explain the discrepancy you are observing.

The discrepancy results from exactly what you’ve described. As I changed the order of the floating-point numbers in the addtion.
Now the problem has been perfectly solved. Thanks.