This thread continues to make me uneasy. Attempting to synchronize accesses to global memory is dangerous and unsupported.
__syncthreads() does not trigger any flush, write to global memory, or anything other than a barrier instruction (i.e. no threads may proceed until all threads reach the barrier).
I assume I can safely and consistently read and write into the same global array partitioned across blocks, as long as reads and writes are into disjoint memory addresses. is it right?
What if I want all my writes to become globally visible to other reads. Will going back to CPU and invoking the kernel again assure that all writes from previojus kernel are complete and globally visible? Is there a better way to do it?
Thanks,
Mike