Hello!
I have a problem when trying to synchronize some threads.
I have a global variable device int *a let’s say.
In the first part of kernel’s execution I fill that array with some values, every thread filling in a value ( a[blockIdx.x * blockDim.x + threadIdx.x] = value )
I want that after this step to synchronize all threads in order to allow them to see each others filled in value.
So I used __syncthreads.
For tests I computed into the host the sum of all element’s values.
Well, when not using __syncthreads in the kernel, the sum is computed correctly.
But using it after the computation, I get a total messed up sum.
Maybe there is another way to synchronize threads for a global variable and not a shared per block one, or the guilty one are my tests, I don’t know.
I mention that I only have the possibility to test on EmuDebug, maybe this is the problem.
Thanks for any kind of help.