Is __synchtreads() synchronizing global memory writes between threads in the same block? According to the programming guide (at the bottom of the page 28 in v2.0 of the guide), I’d say so. So, as my kernel request lots of shared memory, I’m using an array in global memory for the temporary storage, as follows (the code is completely artificial, I’m just trying to make an example):
float* foo; // <- this is in shared memory
float* bar; // <- this is in global memory
bar[threadId.x] = foo[threadId.x]; // (1)
__syncthreads();
if (threadId.x > 42)
foo[threadId.x] = bar[threadId.x - 42]; // (2)
else
foo[threadId.x] = 0;
But the end results of the calculations are wrong, and after lots of debugging I came to the conclusion that __syncthreads() seems like it’s not providing that writes at (1) are all completed, so the reads at (2) get wrong values…
Thanks.