__syncthreads() and global memory

Is __synchtreads() synchronizing global memory writes between threads in the same block? According to the programming guide (at the bottom of the page 28 in v2.0 of the guide), I’d say so. So, as my kernel request lots of shared memory, I’m using an array in global memory for the temporary storage, as follows (the code is completely artificial, I’m just trying to make an example):

float* foo; // <- this is in shared memory

	 float* bar; // <- this is in global memory

	 bar[threadId.x] = foo[threadId.x]; // (1)

	 __syncthreads();

	if (threadId.x > 42)

	   foo[threadId.x] = bar[threadId.x - 42]; // (2)

	else

	   foo[threadId.x] = 0;

But the end results of the calculations are wrong, and after lots of debugging I came to the conclusion that __syncthreads() seems like it’s not providing that writes at (1) are all completed, so the reads at (2) get wrong values…

Thanks.

it is a basic barrier for each thread block so that you can use shared variables between threads in one block. If you want one thread to read data from shared memory, but you are dependent on an other thread finishing the computation that is stored in that memory space. Then you need to use a synchronization point before you read that value to be certain it is correct.