CUDA Memory Consistency

This thread continues to make me uneasy. Attempting to synchronize accesses to global memory is dangerous and unsupported.

__syncthreads() does not trigger any flush, write to global memory, or anything other than a barrier instruction (i.e. no threads may proceed until all threads reach the barrier).

Proceed at your own risk. :)

Mark

I have a follow-up questions:

  1. I assume I can safely and consistently read and write into the same global array partitioned across blocks, as long as reads and writes are into disjoint memory addresses. is it right?
  2. What if I want all my writes to become globally visible to other reads. Will going back to CPU and invoking the kernel again assure that all writes from previojus kernel are complete and globally visible? Is there a better way to do it?
    Thanks,
    Mike
  1. Yes. You can even have blocks read and write the same locations / regions if you want, you just can’t synchronize on them within a single kernel.

  2. Yes. Not really a better way.

Mark

Has anyone tried implementing a mutex using
[url=“Lamport's bakery algorithm - Wikipedia”]http://en.wikipedia.org/wiki/Lamport’s_b...f_the_algorithm[/url] or similar?

[url=“http://research.microsoft.com/~gurevich/Opera/107.pdf”]http://research.microsoft.com/~gurevich/Opera/107.pdf[/url] proves correctness even with un-atomic memory operations.

I will eventually try this and post some source unless someone beats me to it.

EDIT (3/8/07)

see this thread
[url=“The Official NVIDIA Forums | NVIDIA”]http://forums.nvidia.com/index.php?showtop...00&#entry168900[/url]
for more info