# thread control search for help

To generate the following 4*4 matrix
{3,5ï¼Œ7,9ï¼Œ
9ï¼Œ21,37,47ï¼Œ
31,89ï¼Œ173ï¼Œ221ï¼Œ
121,383ï¼Œ777,999
}
The arithmetic to generate the matrix in C code are as follows:
for(i=0;i<3;i++)//row
{
for(j=0;j<3;j++)//column
{ if (j==0)
k=buf[j]+buf[j+1]+1;
else
k=buf[j]+buf[j+1]+buf[j-1];

buf[j]=k;
d1[i*dx+j]=k;

}
buf[0]~buf[4]=1 at first
Problemï¼šBecause there are recursive computation in such arithmetic ,so I let one row compute its first 2 column,and then the next row starts workã€‚Create 4 thread,one thread is actually one row.
CUDA code is ï¼š
int k=-1;//k is the column number of one thread
for(i=0;i<(3+2);i++)
{if(tid==i) active=1;//use active to control
for(j=0;j<2;j++)
if(k>3) active=0;//if the column number of one thread exceeds 3,it was inactivate.
if(active==1)
{k++;
if(k==0)

result=buf_d[k]+buf_d[k+1]+1;
else
result=buf_d[k]+buf_d[k+1]+buf_d[k-1];

buf_d[k]=result;

}

}
}

Only the data of first row is correct.I don’t know why.
3dfd.cu (1.67 KB)