Different results emu versus real runs

Netla · September 10, 2007, 1:59pm

I am taking my first steps in CUDA programming, but even with manual and the many posts on this forum, it seems that something is eluding me.

I am running a nested loop of four levels (z=74,x=116,s=64,r=64). As a first test, I wanted to split it over several blocks and threads. I assigned the x-loop to blockIdx.x, the s-loop to blockIdx.y and the r-loop to threadIdx.x.

Host code:

dim3 grid(116,64,1);  	//working: dim3 grid(116,64,1);

	dim3 threads(64,1,1);  	//working: dim3 threads(1,1,1);

	calcTdomain<<< grid, threads >>>(dfactor,dP_data, droi);

	cudaThreadSynchronize();

Device code:

(dP_data is an array with data, droi is the array in which results are written, dfactor is just a lookup table)

__global__ void calcTdomain(const float* dfactor, const float* dP_data, float* droi)

{

	const int M = 116;

	const int N = 64;

	unsigned int izstart = 0;

	unsigned int izend = 74;

	unsigned int ixstart = blockIdx.x;

	unsigned int ixend = ixstart+1;

	unsigned int isstart= blockIdx.y;

	unsigned int isend = isstart+1;

	unsigned int irstart=threadIdx.x;

	unsigned int irend=irstart+1;// irstart+1;

	for(unsigned int iz=izstart; iz < izend; ++iz)

	{

  for(unsigned int ix=ixstart; ix < ixend; ++ix)

  {

  	for(unsigned  int is=isstart; is < isend; ++is)

  	{

    for(unsigned int ir=irstart; ir < irend; ++ir)

    {

    	droi[iz*M+ix] = droi[iz*M+ix] + 

        dP_data[ int(dfactor[iz*M*N*N*2 + ix*N*N*2 + is*N*2 +ir*2])*N*N+is*N+ir] 

        * dfactor[iz*M*N*N*2 + ix*N*N*2 + is*N*2 +ir*2+1];

   }//ir

  	}//is

  }//ix

   }//iz

}

In emulation mode, the results as written in droi are correct and I checked with printf-statements that the correct z,x,s,r values are used. On the graphics card itself, however, the results are wrong. It seems as if only a bit is written away.

I am using a Geforce 8800 Ultra with the CUDA 1.0 SDK.

Any help, tips or tricks would be very much appreciated.

Netla · September 11, 2007, 10:11am

Solved.
The summation was done linearly by the emulator, but this is not the case in the case on the graphics card itself.

Topic		Replies	Views
Program gives wrong answer except with emulation CUDA Programming and Performance	8	3983	April 21, 2008
emulation working fine but not GPU CUDA Programming and Performance	4	2190	November 24, 2009
Please help: different result, emu vs non emu CUDA Programming and Performance	8	5283	September 11, 2008
Emulation/CPU=correct,Execution/GPU=incorrect emulation CUDA Programming and Performance	26	21778	September 2, 2008
Different results results in emu mode are different from results in non-emu mode CUDA Programming and Performance	4	2487	June 9, 2009
CUDA C programming,output correct in eulation mode but not when run on GPU CUDA Programming and Performance	0	2026	November 21, 2009
Asking for help on a basic excersize CUDA Programming and Performance	4	3814	January 27, 2009
emulation mode and debug mode gave me totally different results! CUDA Programming and Performance	2	1678	May 21, 2009
Code works under emulation, but fails on the device CUDA Programming and Performance	3	2231	July 30, 2009
having problem with simpe CUDA code Code debug CUDA Programming and Performance	4	1698	November 7, 2009

Different results emu versus real runs

Related topics