use of __syncthreads it has the same meaning also for global variables?

Hello!

I have a problem when trying to synchronize some threads.
I have a global variable device int *a let’s say.
In the first part of kernel’s execution I fill that array with some values, every thread filling in a value ( a[blockIdx.x * blockDim.x + threadIdx.x] = value )

I want that after this step to synchronize all threads in order to allow them to see each others filled in value.
So I used __syncthreads.
For tests I computed into the host the sum of all element’s values.
Well, when not using __syncthreads in the kernel, the sum is computed correctly.
But using it after the computation, I get a total messed up sum.

Maybe there is another way to synchronize threads for a global variable and not a shared per block one, or the guilty one are my tests, I don’t know.

I mention that I only have the possibility to test on EmuDebug, maybe this is the problem.

Thanks for any kind of help.

_synthreads() is a block-wide barrier, not a grid-wide barrier.

If you want to sync across all threads of your kernel, use a second kernel launch. Kernel launches are pretty cheap, only about 15us, so don’t be scared to use them when needed.