Help Race conditions2

Please, can someone explain me in a simple manner when race conditions occurs?

if I do this:

float p;
if(p<b[threadIdx.x+off1])
p=b[threadIdx.x+off1];

if(p<b[threadIdx.x+off2])
p=b[threadIdx.x+off2];

are there race conditions? If yes, How I can remove them?

thanks in advance for any help.

Please, can someone explain me in a simple manner when race conditions occurs?

if I do this:

float p;
if(p<b[threadIdx.x+off1])
p=b[threadIdx.x+off1];

if(p<b[threadIdx.x+off2])
p=b[threadIdx.x+off2];

are there race conditions? If yes, How I can remove them?

thanks in advance for any help.

No, because the only changing variables are private per thread.

The three kinds of possible race conditions (read after write, write after read, write after write) are described in the wikipedia article on data hazards. (Never mind the article speaks about pipeline design. Just think of any instructions whose order is not guaranteed).

No, because the only changing variables are private per thread.

The three kinds of possible race conditions (read after write, write after read, write after write) are described in the wikipedia article on data hazards. (Never mind the article speaks about pipeline design. Just think of any instructions whose order is not guaranteed).

Hi tera, thank you very much! I have another question: Actually, what I’m trying to do is this:

I have a 2 3d data structure handled as a 1D array lp1 and phi: their dimension are im_r, im_c, im_d

For each element in lp1 I find the corresponding element in phi and I’m looking for the minimimum between the values of its neighbours.

This is my piece of code:

a=lp1[index];

if(a>0)

{ p=compute_pmin_var(phi,label, 0, im_r, im_c, im_d,index,idx,idy);

if(p<=0.5)

{

phi[index]=p+1;

 if(phi[index]<=0.5)

  { lp1[index]=0;

    sz[index]=1;

        label[index]=0;

}

device float compute_pmin_var(float *phi,char *label, int level, int im_r, int im_c, int im_d,int index, int idx, int idy )

{ int ind_y=(idx-(idx/im_r)*im_r);

int ind_x=((idx)/im_r);

int ind_z= idy;

float p=3;

if( ind_y+1 <im_r && label[index+1]<=level && p>phi[index+1])

     p= phi[index+1];

  if( ind_y-1>=0 && label[index-1]<=level && p>phi[index-1])

      p=phi[index-1];

if ( ind_x+1 <im_c && label[index+im_r]<=level && p>phi[index+im_r])

     p=phi[index+im_r];

 if( ind_x-1>=0 &&  label[index-im_r]<=level && p>phi[index-im_r]) 

    p=phi[index-im_r];

if ( ind_z+1 <im_d && label[index+(im_rim_c)]<=level && p>phi[index+(im_rim_c)])

     p=phi[index+(im_r*im_c)];

 if(ind_z-1>=0 &&label[index-(im_r*im_c)]<=level && p>phi[index-(im_r*im_c)]) 

    p=phi[index-(im_r*im_c)];

return p;

}

Do you think that this could be a good way to do it, or there are some race conditions, or data dependacy that i have not considered?

Again, many thabnks for your help.

Hi tera, thank you very much! I have another question: Actually, what I’m trying to do is this:

I have a 2 3d data structure handled as a 1D array lp1 and phi: their dimension are im_r, im_c, im_d

For each element in lp1 I find the corresponding element in phi and I’m looking for the minimimum between the values of its neighbours.

This is my piece of code:

a=lp1[index];

if(a>0)

{ p=compute_pmin_var(phi,label, 0, im_r, im_c, im_d,index,idx,idy);

if(p<=0.5)

{

phi[index]=p+1;

 if(phi[index]<=0.5)

  { lp1[index]=0;

    sz[index]=1;

        label[index]=0;

}

device float compute_pmin_var(float *phi,char *label, int level, int im_r, int im_c, int im_d,int index, int idx, int idy )

{ int ind_y=(idx-(idx/im_r)*im_r);

int ind_x=((idx)/im_r);

int ind_z= idy;

float p=3;

if( ind_y+1 <im_r && label[index+1]<=level && p>phi[index+1])

     p= phi[index+1];

  if( ind_y-1>=0 && label[index-1]<=level && p>phi[index-1])

      p=phi[index-1];

if ( ind_x+1 <im_c && label[index+im_r]<=level && p>phi[index+im_r])

     p=phi[index+im_r];

 if( ind_x-1>=0 &&  label[index-im_r]<=level && p>phi[index-im_r]) 

    p=phi[index-im_r];

if ( ind_z+1 <im_d && label[index+(im_rim_c)]<=level && p>phi[index+(im_rim_c)])

     p=phi[index+(im_r*im_c)];

 if(ind_z-1>=0 &&label[index-(im_r*im_c)]<=level && p>phi[index-(im_r*im_c)]) 

    p=phi[index-(im_r*im_c)];

return p;

}

Do you think that this could be a good way to do it, or there are some race conditions, or data dependacy that i have not considered?

Again, many thabnks for your help.

Obviously this won’t work as phi and label are both read and written without synchronization. Use different arrays for input and output of your computation.

P.S.: Code posted to the forums is a lot more readable when posted between [font=“Courier New”][code][/font]…[font=“Courier New”][/code][/font] tags.

Obviously this won’t work as phi and label are both read and written without synchronization. Use different arrays for input and output of your computation.

P.S.: Code posted to the forums is a lot more readable when posted between [font=“Courier New”][code][/font]…[font=“Courier New”][/code][/font] tags.

To be onest, I tried it and it works…so why? Moreover, outside the function,( compute_pmin_var) each thread updates the value of phi, and then reads it but the index for each read and write is the same, is this a matter?

Thank you

To be onest, I tried it and it works…so why? Moreover, outside the function,( compute_pmin_var) each thread updates the value of phi, and then reads it but the index for each read and write is the same, is this a matter?

Thank you

It probably works (most of the time) if you launch it with a low enough block number so that all blocks run concurrently. I’d guess it fails with a larger number of blocks. Are you checking results against a serial implementation of the CPU?

Having the same thread write and read an array element is no problem, memory access ordering is guaranteed within a thread. The problems appear if one thread writes and a different thread reads (or the other way around).

It probably works (most of the time) if you launch it with a low enough block number so that all blocks run concurrently. I’d guess it fails with a larger number of blocks. Are you checking results against a serial implementation of the CPU?

Having the same thread write and read an array element is no problem, memory access ordering is guaranteed within a thread. The problems appear if one thread writes and a different thread reads (or the other way around).

Thank you, tera.

Yes, I tried a comparison with a sequential implementation and it works. So, if I use two different data structure for both phi and label, I avoid both race conditions and syncrhonization issues?

And in such a case:

if((ind_y+1 <im_r)&&(abs(label[index+1])==3) && (phi[index+1]<0))

 {label[index+1]=-1;

  phi[index+1]= -1;

 ln1[index+1]=1;

}

if((ind_y-1>=0)&&(abs(label[index-1])==3)&& (phi[index-1]<0))

{label[index-1]=-1;

  phi[index-1]= -1;

 ln1[index-1]=1;

}

which kind of problem can i have? race conditions? syncrhonization issues?

thank you very much.

Thank you, tera.

Yes, I tried a comparison with a sequential implementation and it works. So, if I use two different data structure for both phi and label, I avoid both race conditions and syncrhonization issues?

And in such a case:

if((ind_y+1 <im_r)&&(abs(label[index+1])==3) && (phi[index+1]<0))

 {label[index+1]=-1;

  phi[index+1]= -1;

 ln1[index+1]=1;

}

if((ind_y-1>=0)&&(abs(label[index-1])==3)&& (phi[index-1]<0))

{label[index-1]=-1;

  phi[index-1]= -1;

 ln1[index-1]=1;

}

which kind of problem can i have? race conditions? syncrhonization issues?

thank you very much.

does threadfence() function help in this case?
And do you think that it would be better using different memory buffer for input and output of the computation or using atomic operations?

does threadfence() function help in this case?
And do you think that it would be better using different memory buffer for input and output of the computation or using atomic operations?

The threadfence functions are almost never the answer to a race condition. They are not thread barriers, like __syncthreads().

Thank you sibert.

Please can you explain me if in these cases i have problem of syncrhonization or race conditions?

When in the two examples that i have posted I need atomiic operations? Is better to use atomic operations or duplicate the structures to have a memory location for the input and one for the output??

Usually it is better to have separate memory for input and output, as atomic operations are expensive. And for floating point data, atomic operations create the additional nuisance that rounding errors suddenly depend on the specific timing of each execution.