A problem about different results between CPU and GPU

15396750637 · October 27, 2017, 1:50am

-arch=sm_20;
Here is one of my cuda kernel codes:

static global void calxn(int ntotal,double *q_list,double *xn)
{
int i = blockIdx.x * blockDim.x + threadIdx.x;

if(i>=1&&i<ntotal)
{
	xn[i] =q_list[i+1] - q_list[i] - 1.0e0;
}
if(i==ntotal)
{
	xn[ntotal] = ntotal - 1.0e0 + q_list[1] - q_list[ntotal];
}

}

Here is corresponding CPU code:

void c_calxn(int ntotal,double *q_list,double *xn)
{
int i;
for(i=1;i<ntotal;i++)
{
xn[i] =q_list[i+1] - q_list[i] - 1.0e0;
}

	xn[ntotal] = ntotal - 1.0e0 + q_list[1] - q_list[ntotal];

}

The problem is when i change the “xn[ntotal] = ntotal - 1.0e0 + q_list[1] - q_list[ntotal];” into “xn[ntotal] = q_list[1] - q_list[ntotal] + ntotal -1.0e0;” ,the difference will become. Usually,the error is
0.000000000000007105427357601002 or numbers a few times smaller like 0.000000000000003552713678800501.
I have considered this is normal random error in double precision computing , but this error may accumulated,and it will finally become a ture error.

This small problem confused me a lot! I appreciate someone explain this to me.

D_Stranz · October 27, 2017, 8:02pm

Do a Google search and read about “DBL_EPSILON”. Any difference that is less than DBL_EPSILON is not significant and should be considered as a zero difference. Your differences are less than DBL_EPSILON.

Floating point errors will always accumulate if you do not take that into account when writing your code. If the errors in your calculation can accumulate to the point where they are significant, then you might have to consider an alternative way to write the algorithm.

Other than that, why are you using 1 and ntotal as the limits for your arrays? In C / C++, array limits should be 0 <= i < ntotal, not 1 <= i <= ntotal as in your code. If your arrays are sized to ntotal, then your code is accessing memory beyond the array bounds (and is ignoring array element 0).

15396750637 · October 30, 2017, 1:55am

Thanks a lot!

I have read about “DBL_EPSILON” after i saw your reply. It’s strange that the errors in my cuda code are sometimes same as “DBL_EPSILON”.

#define DBL_EPSILON 2.2204460492503131e-016 // smallest such that 1.0+DBL_EPSILON != 1.0

However, the errors can always accumulate one decimal place that is unacceptable. I tried to change my algorithm to make sure there are no errors.But the cuda program will be very inefficient.

I intended to ignore the array element 0. When i allocate the memory, i allocate 2 more array elements.But i don’t know whether it is secure for my program.

Topic		Replies	Views
CPU and CUDA code yield different results? CUDA Programming and Performance	3	1187	June 28, 2013
Accumulation Difference between CUDA and CPU? CUDA Programming and Performance	7	10593	August 22, 2009
Need help implementing a simple kernel! I need to write a kernel for a simple euler forward meth CUDA Programming and Performance	3	1100	July 26, 2010
weird problem CUDA Programming and Performance	11	5613	November 15, 2010
Same Code (really, it is) - Much Different Results CUDA Programming and Performance	38	14197	September 30, 2010
CUDA floating point CUDA Programming and Performance	4	2184	April 20, 2009
Precision error - CUDA kernel Vs CPU Reduction kernel with fabs and differnce operations on float CUDA Programming and Performance	3	2096	May 6, 2011
floating point precision on CUDA CUDA Programming and Performance	11	15137	June 8, 2010
A strange bug about CUDA computing CUDA Programming and Performance	5	5281	December 15, 2007
Precision Fail CUDA Programming and Performance	5	10582	March 10, 2011

A problem about different results between CPU and GPU

Related topics