A question about <<<1,256>>> and<<<16,16>>>

bsforever · December 16, 2008, 2:35am

Hello,everyone

In this days, I try to call my kernel,use the<<<1,256>>> and <<<16,16>>>,I found that when i use the <<<1,256>>>,i can got the right result,but when I use the <<<16,16>>>,I can not got the right result,I got the answer as 1.#QNAN0. I do not know why.

In the begin of my program,I define i as:

int i;
i=blockIdx.x*blockDim.x+threadIdx.x;

Thanks!

Quoc_Vinh · December 16, 2008, 3:00am

Try this.

case1:

[b]dim3 blockSize(256, 1, 1);

dim3 gridSize(1, 1, 1);

<<<gridSize, blockSize>>>

int i;

i=blockIdx.x*blockDim.x+threadIdx.x;[/b]

case2:

[b]dim3 blockSize(16, 1, 1);

dim3 gridSize(16, 1, 1);

<<<gridSize, blockSize>>>

int i;

i=blockIdx.x*blockDim.x+threadIdx.x;[/b]

bsforever · December 16, 2008, 5:21am

Thanks for your reply,but the problem remians after I try it. This is very strange.

tmurray · December 16, 2008, 5:44am

you have a race condition between multiple blocks (or you aren’t handling multiple blocks correctly at all).

alex_dubinsky · December 17, 2008, 2:03am

Right. Are you using shared memory? (That’s where such problems occur)

bsforever · December 17, 2008, 5:04am

Thanks for your reply. Now,I still have this problem,and it driver me crazy!!!

This is my CUDA coda.Thanks for your advise.

[codebox]global void test(float *dforce,float *dvel,float *da,int *difix,float *dmass,float *fe,int *dX,float *dbigf,float *ddisp,float *ddelt,float *lentemp,float *strain,

				float *Pstress,float area)

{

int i;

i=blockIdx.x*blockDim.x+threadIdx.x;

float E=30E6;

float densityo=0.000724;

if(i<NODNUM)

{

ddisp[i]=0.0E0;

}

__syncthreads();

int k=0;

while(k<STEP)

{

        if(i<NODNUM) 

 dbigf[i]=0;

  __syncthreads();

if(i<NODNUM)

{

	lentemp[i]=0.0E0;

	strain[i]=0.0E0;

	Pstress[i]=0.0E0;

}

__syncthreads();

 if(i<NODNUM-1)

 {

	  	lentemp[i]=dX[i+1]+ddisp[i+1]-dX[i]-ddisp[i];

		ddelt[i]=0.7*lentemp[i]/sqrt(E/densityo);

		strain[i]=(ddisp[i+1]-ddisp[i])/lentemp[i];

		  __syncthreads();

		Pstress[i]=E*strain[i];

		  __syncthreads();

		fe[i]=area*Pstress[i];

		  __syncthreads();

 }

 if(i==0)

 {

	 dbigf[0]=-fe[0];

 }

 if(i==1)

 {

	 dbigf[NODNUM-1]=fe[NODNUM-2];

 }

 if(i>0 && i<NODNUM-1) 

	 dbigf[i]=fe[i-1]-fe[i];

	 __syncthreads();

 da[i]=(dforce[i]-dbigf[i])/dmass[i];

if(difix[i]==1) da[i]=0;

           dvel[i]=dvel[i]+ddelt[NODNUM-2]*da[i];

           ddisp[i]=ddisp[i]+ddelt[NODNUM-2]*dvel[i];

	__syncthreads();

       k++;

}

}[/codebox]

E.D_Riedijk · December 17, 2008, 5:33am

Thanks for your reply. Now,I still have this problem,and it driver me crazy!!!

This is my CUDA coda.Thanks for your advise.

[codebox]
 if(i<NODNUM-1)

 {

	  	lentemp[i]=dX[i+1]+ddisp[i+1]-dX[i]-ddisp[i];

		ddelt[i]=0.7*lentemp[i]/sqrt(E/densityo);

		strain[i]=(ddisp[i+1]-ddisp[i])/lentemp[i];

		  __syncthreads();

		Pstress[i]=E*strain[i];

		  __syncthreads();

		fe[i]=area*Pstress[i];

		  __syncthreads();

 }
}[/codebox]

I think you are using way more syncthreads than needed. Apart from that: above you have a syncthreads that deadlocks. if i >= NODUM-1 threads are not going to the syncthreads(). The other threads are & are waiting indefinitely.