I have a couple of questions regarding global memory consistency in CUDA. I looked through the existing thread on memory consistency but I thought it might be a good idea to just ask these questions here again.
Is it a guaranteed that all writes to global memory by threads in a thread block prior to a __syncthreads() are visible to all threads in the thread block after the __syncthreads() has executed?
The CUDA 2.0 manual only talks about the visibility of shared memory writes across a thread block in the context of __syncthreads(), while the 3.1 manual also talks about writes to global memory. Was this added to the semantics of __syncthreads in later versions of CUDA?
Is there a single document that explains the consistency aspects of the CUDA memory model in detail? Such a document would be really helpful!
I would appreciate any help on these.
Thanks in advance!