CUDA Memory Consistency

Mark_Harris · March 6, 2007, 7:07pm

This thread continues to make me uneasy. Attempting to synchronize accesses to global memory is dangerous and unsupported.

__syncthreads() does not trigger any flush, write to global memory, or anything other than a barrier instruction (i.e. no threads may proceed until all threads reach the barrier).

Proceed at your own risk. :)

Mark

msmelyan · March 6, 2007, 9:57pm

I have a follow-up questions:

I assume I can safely and consistently read and write into the same global array partitioned across blocks, as long as reads and writes are into disjoint memory addresses. is it right?
What if I want all my writes to become globally visible to other reads. Will going back to CPU and invoking the kernel again assure that all writes from previojus kernel are complete and globally visible? Is there a better way to do it?
Thanks,
Mike

Mark_Harris · March 7, 2007, 6:51pm

Yes. You can even have blocks read and write the same locations / regions if you want, you just can’t synchronize on them within a single kernel.
Yes. Not really a better way.

Mark

GregD · March 8, 2007, 1:16am

Has anyone tried implementing a mutex using
[url=“Lamport's bakery algorithm - Wikipedia”]http://en.wikipedia.org/wiki/Lamport’s_b...f_the_algorithm[/url] or similar?

[url=“http://research.microsoft.com/~gurevich/Opera/107.pdf”]http://research.microsoft.com/~gurevich/Opera/107.pdf[/url] proves correctness even with un-atomic memory operations.

I will eventually try this and post some source unless someone beats me to it.

EDIT (3/8/07)

see this thread
[url=“The Official NVIDIA Forums | NVIDIA”]http://forums.nvidia.com/index.php?showtop...00&#entry168900[/url]
for more info