Synchronizing thrads inside kernels loop

Is it possible to synchronize threads when we use control expression as FOR , I think that to do this the loop variable (index) must be shared and incrementations have to be atomic, but I’m not sure !! the same question using the IF condition. This is an example :

for(unsigned l=0; l<ddA; l++){

 Jds[ty] = dkm[ty*ddA+l];


  if( (tx%JL) == 0 ){

  	dy = tx/JL;  //==> I dont know if this will work correcty or not 

                                            // tx and JL are unsigned intergers and tx = p*JL 

                                            // when p is an other unsigned integer.

   for(unsigned int i=0; i<JL; i++){

     Xs[ty][dy] = X[Jds[ty]*JL+dy];







I don’t think there’s any problem with the line of code for which you have a comment. As far as syncing inside loops (as you are doing inside the outer loop), the program will hand if not all threads in a threadblock go through the same iterations. For example, in your case, if some threads don’t even enter the loop because their local value ddA is negative or 0, then you should have a hang.

Loop control variables don’t have to be shared, atomic operations aren’t needed either. The only thing to keep in mind is that ALL threads in a threadblock must execute a parcitular __syncthreads call in order for your program not to hang.