Accuracy difference Device vs. Emu

eirik · June 24, 2008, 4:58pm

I have written a a couple of kernels in my project of implementing subdivision using CUDA. The problem that I am seing is that while in emulation mode I get a total difference in the vertices between a regular CPU implementation and my CUDA implementation in the order of 10^-5 which I am happy with. But, when running in device mode, this error climbs to 10^-2 which is quite a lot to say the least! The code for the kernels that calculate new vertices is is given under.

Does anyone see something that could cause this strange behaviour? Any help is appreciated.

__global__

void vertTransKernelV2(float4* vertsIn, float4* vertsOut, uint2** neighList, bool cpy)

{

	int index = blockIdx.x*blockDim.x+threadIdx.x;

	float4 v = vertsIn[index];

	

	uint2* neighs = getNeighList(index, neighList, cpy);

	

	int n = neighs[0].y;

	float4 sumNeigh = make_float4(0.0, 0.0, 0.0, 0.0);

	for(int i = 1; i <= n; i++)

  sumNeigh = sumNeigh+vertsIn[neighs[i].y];

	float alpha = calcAlpha((float)n);

	float4 val = v*(1.0-alpha)+sumNeigh*(alpha*(1.0/(float)n));

	val.w = -1.0;

	

	vertsOut[index] = val;

}

__device__

float calcAlpha(float numNeigh)

{

	return (1.0/9.0)*(4.0-2.0*__cosf((2.0*M_PI)/numNeigh));

}

__global__

void vertGenKernelV2(float4* vertsIn, float4* vertsOut, uint4* indsIn, uint4* indsOut, int lastIndex, uint2** neighList)

{

	int index = blockIdx.x*blockDim.x+threadIdx.x;

	uint4 currTri = indsIn[index];

	float4 v1 = vertsIn[currTri.x];

	float4 v2 = vertsIn[currTri.y];

	float4 v3 = vertsIn[currTri.z];

	

	float4 centroid = (v1+v2+v3)/3.0;

	centroid.w = 0.0;

	int vertOutIndex = lastIndex+index;

	vertsOut[vertOutIndex] = centroid;

}

lokiman · June 24, 2008, 5:02pm

CUDA works (not EMU mode) with float which is not the same as standard IEEE float on x86 CPU… so the result are different …

eirik · June 24, 2008, 5:05pm

I know that, but I just think the difference is too much.

seibert · June 24, 2008, 8:54pm

How big is n typically in your first kernel? If you are summing a large number of values, the accumulated round-off can get rather big. This can be mitigated using Kahan summation (see wikipedia).

eirik · June 24, 2008, 8:57pm

n is the number of neighbours a vertex has in the grid and is not larger than 7 so i do not think this should be the problem.

eirik · June 24, 2008, 9:25pm

I implemented the Kahan sum over the elements to no effect sadly.

E.D_Riedijk · June 25, 2008, 4:54am

when running on CPU, did you set the CPU to use only IEEE float? Otherwise the CPU is using 80bit intermediate results on FPU. You can search the forum for how to do it, there have been some examples posted in the past.

Reimar · June 25, 2008, 6:55am

In that case I would suggest you try to calculate all possible alpha values and store them in constant memory, cos is one of the functions I would be suspicous about here. You could even go further and to the same with 1-alpha and alpha/n, that does then not leave many places that could cause this error…

Topic		Replies	Views
Accumulation Difference between CUDA and CPU? CUDA Programming and Performance	7	10593	August 22, 2009
Different output from emulation and device precision issues on GPU vs CPU CUDA Programming and Performance	9	7342	August 20, 2009
Different results using diffrent memory types CUDA Programming and Performance	22	4608	April 14, 2010
floating point precision on CUDA CUDA Programming and Performance	11	15137	June 8, 2010
Program gives wrong answer except with emulation CUDA Programming and Performance	8	3974	April 21, 2008
precision difference between emulation mode and cpu CUDA Programming and Performance	1	3951	February 11, 2010
CPU and GPU floating point calculations Results are different CUDA Programming and Performance	6	22191	August 7, 2010
CPU Vs GPU output CUDA Programming and Performance	4	1045	September 28, 2010
CUDA floating point CUDA Programming and Performance	4	2184	April 20, 2009
A problem about different results between CPU and GPU CUDA Programming and Performance	2	1127	October 30, 2017

Accuracy difference Device vs. Emu

Related topics