A problem about different results between CPU and GPU

-arch=sm_20;
Here is one of my cuda kernel codes:

static global void calxn(int ntotal,double *q_list,double *xn)
{
int i = blockIdx.x * blockDim.x + threadIdx.x;

if(i>=1&&i<ntotal)
{
	xn[i] =q_list[i+1] - q_list[i] - 1.0e0;
}
if(i==ntotal)
{
	xn[ntotal] = ntotal - 1.0e0 + q_list[1] - q_list[ntotal];
}

}

Here is corresponding CPU code:

void c_calxn(int ntotal,double *q_list,double *xn)
{
int i;
for(i=1;i<ntotal;i++)
{
xn[i] =q_list[i+1] - q_list[i] - 1.0e0;
}

	xn[ntotal] = ntotal - 1.0e0 + q_list[1] - q_list[ntotal];

}

The problem is when i change the “xn[ntotal] = ntotal - 1.0e0 + q_list[1] - q_list[ntotal];” into “xn[ntotal] = q_list[1] - q_list[ntotal] + ntotal -1.0e0;” ,the difference will become. Usually,the error is
0.000000000000007105427357601002 or numbers a few times smaller like 0.000000000000003552713678800501.
I have considered this is normal random error in double precision computing , but this error may accumulated,and it will finally become a ture error.

This small problem confused me a lot! I appreciate someone explain this to me.

Do a Google search and read about “DBL_EPSILON”. Any difference that is less than DBL_EPSILON is not significant and should be considered as a zero difference. Your differences are less than DBL_EPSILON.

Floating point errors will always accumulate if you do not take that into account when writing your code. If the errors in your calculation can accumulate to the point where they are significant, then you might have to consider an alternative way to write the algorithm.

Other than that, why are you using 1 and ntotal as the limits for your arrays? In C / C++, array limits should be 0 <= i < ntotal, not 1 <= i <= ntotal as in your code. If your arrays are sized to ntotal, then your code is accessing memory beyond the array bounds (and is ignoring array element 0).

Thanks a lot!

I have read about “DBL_EPSILON” after i saw your reply. It’s strange that the errors in my cuda code are sometimes same as “DBL_EPSILON”.

#define DBL_EPSILON 2.2204460492503131e-016 // smallest such that 1.0+DBL_EPSILON != 1.0

However, the errors can always accumulate one decimal place that is unacceptable. I tried to change my algorithm to make sure there are no errors.But the cuda program will be very inefficient.

I intended to ignore the array element 0. When i allocate the memory, i allocate 2 more array elements.But i don’t know whether it is secure for my program.