for loop in if section causes collapse?

[codebox]

//p is fixed,less than threads num//

for (eleIndex = 1; eleIndex < p; eleIndex *= 2)

{

middle = 1 - middle;

right = 1 - middle;

__syncthreads();

if (threadIdx.x < p) ////if I put this if here,it works fine////

{

scan[middle*p+threadIdx.x] = scan;

if (threadIdx.x >= eleIndex)

 scan[middle*p+threadIdx.x] += scan;

}

}

__syncthreads();

[/codebox]

[codebox]//p is fixed,less than threads num//

if (threadIdx.x < p) ////if I put this if here,outer for loop,it will crash////

{

for (eleIndex = 1; eleIndex < p; eleIndex *= 2)

{

middle = 1 - middle;

right = 1 - middle;

__syncthreads();

scan[middle*p+threadIdx.x] = scan;

if (threadIdx.x >= eleIndex)

 scan[middle*p+threadIdx.x] += scan;

}

__syncthreads();

[/codebox]

I just want some threads run the for loop,while others wait at syncthreads until all threads reach this barrier. It seems that if some threads do much work while others wait too long,the program will crash. I hope somebody can tell me why, thanks.

scan is allocated in shared memory,with enough space

All threads must participate in the __syncthreads() barrier (which isn’t the case in the second piece of code), otherwise the threads that do not participate will never reach the barrier and the program crashes.
In other words, all threads must call __syncthreads() the exact same number of times.

N.

thanks for your reply,however,I am still confused.

[codebox]if (threadIdx.x < p)

{

for

{

__syncthreads();  //inner

}

}

__syncthreads(); //outer[/codebox]

those threads whose id is larger than p can’t reach the inner syncthreads,however,they can wait at the outer syncthreads,am I right? If so,threads whose id larger than p will wait at outer sync until those with threadIdx < p finish inner work. So why this code make collapse?

All threads in a block must reach the same syncthreads barrier, not just any syncthreads.

Suppose you have this:

[codebox]

if (threadIdx.x < p)

{

for (int i=0;i<4;++i)

{

__syncthreads();  //inner

}

}

__syncthreads(); //outer

[/codebox]

The first p threads will issue a total of 5 __syncthreads() calls while the remaining threads only issue a single __syncthreads() call.

The first p threads will execute the inner loop, but the remaining threads will jump straight to the last __syncthreads() call and wait for the first 4 threads to arrive at a __syncthreads().

So in fact, the threads with (threadIdx.x >= p) are synced with the first call to __syncthreads() in the inner loop for the remaining threads.

After that, the first p threads issue another __syncthreads() call, but the remaining threads never perform another call to __syncthreads() and therefore never arrive at the synchronization barrier.

N.

Thank you Nico,your answer is perfect