shared memory and __syncthreads() one writer, n readers


in my algorithm the threds of a block all operate on one specific element in shared memory. when they are all done the next element is loaded.

i have reduced my problem to the following code which stops with the message “incorrect use of __syncthreads()” in device emulation (after the first time __sync is reached in the loop). if i launch this code in normal mode my system crashes.

__global__ void f()


        __shared__ int keep_going;

       const int thread_id = threadIdx.y * blockDim.x + threadIdx.y;

       if (thread_id == 0)

                keep_going = 1;


       while (keep_going != 0)


                if (thread_id == 0)

                        keep_going = 0;




i know that you have to be careful using __sync inside of loops/branches but i don’t understand the problem here, since there is no way a thread does not reach the __sync.

thanks for any help : )


Well, all threads are running until the first syncthreads, keep_going = 1

the first 32 threads are running until the second __syncthreads. keep_going has been made 0

the next 32 threads start to run, the while is invalid (keep_going = 0), so they skip the next section and never reach the second __syncthreads

i didn’t consider that the threads could be serialized in a block.

thank you very much : )

I think a __syncthreads() as the first statement in the while loop will solve the race condition.

I have tried the ways mentioned above but I always got keep_going=1.And also I didn’t met the complier error. It shows when all the threads reached the __syncthreads(), only the first thread is Valid .Why ? Confused!!

I haven’t tried this, but your problem is likely because you’re using a const for the thread_id variable. The compiler may be only setting it once from the first thread, hence why thread_id always looks like it’s zero.

I know this is an old thread but I had a similar problem so thought I’d reply.