Program gives wrong answer except with emulation

diehard2 · April 19, 2008, 6:31pm

Hi,

I’m sure I’m doing something stupid here, but I have a simple program, that works just fine in c++ and the kernel works fine in emulation mode. However, it gives me totally different (and wrong) numbers when I run it on the gpu. I’m only using one thread for simplicity’s sake and its obviously not optimized for speed. The project is attached. Could someone give me an idea of what’s going wrong here? Thanks.

dim3 grid(1, 1, 1);

    dim3 onethread(1,1,1);

 EDCalc<<<grid, onethread >>> (cudagenomearray, cudaedspacingarray,cudarhoarray, cudank, cudadistarray, ptsperiter, boxes,1.0f/rough );

and

__global__ void

EDCalc( float* genome,float* edspacing,float* rhoarray, float2* dnk, float* distarray, int ptsperthread, int refllayers, float roughness )

{

      for(int i = 0; i < ptsperthread; i++)

  	{

  	dnk[i].x = 0.0f;

  	dnk[i].y = 0.0f;

  	float temp = 0;

  	for(int k = 0; k < refllayers; k++)

  	{

    

    float dist = (edspacing[i]-distarray[k] )*roughness;

    if(dist > 6.0f)

    {

    	temp += (rhoarray[k])*(2.0f);

    }

    else if (dist > -6.0f)

    {

    	temp += (rhoarray[k])*(1.0f+erff(dist));

    }

  	}

  	dnk[i].x = temp;

  }

}

EDIT:

I’m using SDK 2.0 beta on Vista with a geforce 8600 GT
CUDAtest.zip (264 KB)

chris22 · April 19, 2008, 10:30pm

The basic programming model is that threads issue in groups of 32 called warps. I’ve never actually tried it, but it would not be surprising if the driver promoted the number of threads to the closest multiple of 32. That would be consistent with how the programming guide and the occupancy calculator determine the maximum number of registers.

diehard2 · April 20, 2008, 12:38am

Hi, thanks for the response. Shouldn’t the emulator have the same response then?

AndreiB · April 21, 2008, 9:51am

Are you aware that GPU currently supports only single-precision? This may cause differences in results.

BTW, CUDA is not supported on 6600 GT, do you mean 8600?

diehard2 · April 21, 2008, 12:25pm

haha, yes i mean the 8600. nice catch. The precision shouldn’t matter here. I’m using floats in emulation and the c++ part also. Not only that, I’m pretty sure its not an underflow because the final output file reports all of the nk’s divided by the last value (for some reason, this calculation is always reported like this). These ratios are as expected. I didn’t even realize anything was wrong until I plugged it into the next step. So whatever I (or CUDA) am doing wrong, it is consistently wrong. Thanks for any help.

AndreiB · April 21, 2008, 1:40pm

Even if you use float in host code all intermediate FPU uses extended (80-bit) precision for intermediate calculations.

You can use _set_controlfp() to instruct compiler to use single precision. GIve it a try and check if your results are still much different on GPU and in emulation mode.

diehard2 · April 21, 2008, 1:50pm

Thanks for the response. I add

_set_controlfp(_PC_24,_MCW_PC);

right before my call to the kernel in emulation mode, and I still get the correct answer in emulation and the wrong answer from the GPU. Any other ideas? Thanks.

MisterAnderson42 · April 21, 2008, 2:09pm

It could be a problem where you don’t have all arrays copied to the GPU device/constant memory. This would not show up in emulation mode because kernels can access host memory there.

diehard2 · April 21, 2008, 3:43pm

Well, i figured it out. In my kernel, I accidentally had one of my variables as a double instead of a float. Now, while this was accidental, I was under the impression that CUDA just translated to float if you used doubles. This gives the completely wrong answer if you use a double. Very odd. Thanks for all of the help.

Topic		Replies	Views
EmuDebug question CUDA Programming and Performance	4	3878	January 16, 2009
Possible bug in arithmetic? Difference in emulated and "normal" mode CUDA Programming and Performance	6	4512	December 30, 2009
discrepancy between CPU and GPU after a division (accuracy issue) CUDA Programming and Performance	3	1520	June 10, 2015
Why accuracy CPU and GPU not equal? CUDA Programming and Performance	6	10976	October 28, 2014
Why the result so different? Result is same as CPU's in Emulation,but all wrong when GPU comput CUDA Programming and Performance	1	3061	February 22, 2009
Is there a difference between GPU double precision and CPU double precision? CUDA Programming and Performance	14	10794	November 26, 2009
Different output from emulation and device precision issues on GPU vs CPU CUDA Programming and Performance	9	7249	August 20, 2009
Emulation and GPU gives totally different results Please help!!!! CUDA Programming and Performance	0	662	December 29, 2009
Different Output on Device and Emulation Mode 2 What makes difference b/w Emu and Dev? CUDA Programming and Performance	15	13463	December 2, 2007
Wrong output when i run the same code in the GPU with single thread CUDA Programming and Performance	1	12501	February 28, 2011

Program gives wrong answer except with emulation

Related topics