transformation a c function from into cuda.

syoon · February 10, 2011, 11:50pm

do you think the following transformation from cpu into cuda is correct?

somehow i am getting a little different outcome.

the location of __syncthreads(); is not a big factor…

what did i do wrong?

any comments are welcome and thanks in advance…

=================================================================

cuda version

{
int i,j,k,ns,nc,m;
REAL temp;

nc=kmax+2;
ns=nc*(jmax+2);

for (int slide = 0; slide <= (kmax+blockDim.z-1); slide += blockDim.z)
{

i = blockDim.xblockIdx.x+threadIdx.x;
j = blockDim.yblockIdx.y+threadIdx.y;
k = slide + threadIdx.z;

if ( i<=imax && j<=jmax && k<=kmax )
{
m = ins+jnc+k ;
temp= (PHI[m]-PHI[m-ns])/dltx+(PHI[m+ns]-PHI[m])/dltx ;
PHI[m] = PHI[m]-delt*temp;
}
__syncthreads();

}

======================================================================
cpu version

for (i=0;i<=imax;i++)
for (j=0;j<=jmax;j++)
for (k=0;k<=kmax;k++)
{
m = ins+jnc+k ;
temp=(PHI[m]-PHI[m-ns])/dltx+(PHI[m+ns]-PHI[m])/dltx;
PHI[m] = PHI[m]-delt*temp;
}

varslan · February 11, 2011, 1:44pm

Hi syoon,

I suppose your PHI is in global memory.
__syncthreads(); synchronize only in a thread block,
So your call PHI[m+ns],PHI[m] and PHI[m-ns] are not safe (threads race), because some other thread may have change the value.

syoon · February 11, 2011, 5:37pm

Thanks for your reply…

yes. its in global memory.

i am kind of new to cuda and i was suspecting PHI might have problems you just described, but i was not sure…

does this mean i need to move that to shared memory first? is there other way than using shared memory because size may become bigger.

i will try but please would give me some advice?

varslan · February 11, 2011, 7:33pm

======================================================================

cpu version

for (i=0;i<=imax;i++)
for (j=0;j<=jmax;j++)

   for (k=0;k<=kmax;k++)

  {

      m = i*ns+j*nc+k ;  

      temp=(PHI[m]-PHI[m-ns])/dltx+(PHI[m+ns]-PHI[m])/dltx;

      PHI[m] = PHI[m]-delt*temp;

  }

Check first if your CPU version is good:

temp=(PHI[m]-PHI[m-ns])/dltx+(PHI[m+ns]-PHI[m])/dltx; <==> temp=(PHI[m+ns]-PHI[m-ns])/dltx;

Moreover with

m = ins+jnc+k ;

nc=kmax+2;

ns=nc*(jmax+2);

when i=j=k=0 => m=0 you should have a segmentation fault on PHI[m-ns]

syoon · February 14, 2011, 8:18pm

well. you have an eagle’s eye…

the index in c version starts from 1…

so for cuda version i have

i = blockDim.x*blockIdx.x+threadIdx.x+1;

j = blockDim.y*blockIdx.y+threadIdx.y+1;

k = slide + threadIdx.z+1;

i didnt want to confuse some people with “+1”.

original code is much longer than i put to make it simple…

i try the followings.

if ( i<=imax && j<=jmax && k<=kmax )

{

      m = i*ns+j*nc+k ;

      PHI_m = PHI[m];

      PHI_mms = PHI[m-ns];

      PHI_pms = PHI[m+ns];

      __syncthreads();

      CNVTPHI=(PHI_m-PHI_mms)/dltx+(PHI_pms-PHI_m)/dltx;

      PHI[m] = PHI_m-delt*CNVTPHI;

}

__syncthreads();

theoretically i did correct, i think(?)

any problems with my thinking?

Thanks in well advance.

Topic		Replies	Views
does this code have problem? CUDA Programming and Performance	6	3892	December 9, 2007
using syncthreads still at n00b status CUDA Programming and Performance	4	16035	December 1, 2010
__syncthreads screwes calculation CUDA Programming and Performance	2	3382	November 22, 2007
__syncthreads() issue CUDA Programming and Performance	10	1190	February 10, 2011
__syncthreads() limitation.. Help please.. CUDA Programming and Performance	16	7100	January 4, 2009
cuda syncthreads fail CUDA Programming and Performance	7	3820	February 22, 2013
syncthreads() issue CUDA Programming and Performance	3	1687	March 29, 2009
__syncthreads and __threadfence together in a loop CUDA Programming and Performance	5	3612	October 15, 2010
how to avoid race condition? CUDA Programming and Performance	7	5498	October 23, 2009
CUDA BUG? Shared memory contents differ across threads __syncthreads() not working??? CUDA Programming and Performance	1	1868	September 10, 2009

transformation a c function from into cuda.

Related topics