Memory Consistency and __syncthreads()

aprasad · July 4, 2011, 4:43pm

Hello,

I have a couple of questions regarding global memory consistency in CUDA. I looked through the existing thread on memory consistency but I thought it might be a good idea to just ask these questions here again.

Is it a guaranteed that all writes to global memory by threads in a thread block prior to a __syncthreads() are visible to all threads in the thread block after the __syncthreads() has executed?
The CUDA 2.0 manual only talks about the visibility of shared memory writes across a thread block in the context of __syncthreads(), while the 3.1 manual also talks about writes to global memory. Was this added to the semantics of __syncthreads in later versions of CUDA?
Is there a single document that explains the consistency aspects of the CUDA memory model in detail? Such a document would be really helpful!

I would appreciate any help on these.

Thanks in advance!
Ashwin

LSChien · July 5, 2011, 1:43am

Yes, from B.6 of programming Guide 4.0

seibert · July 5, 2011, 7:42pm

I’m not sure if there is a good single source for this, although you should take a look at Appendix B.5 in the CUDA C Programming Guide (4.0) that describes the memory fence functions. Those functions add to the default behavior, so perhaps you’ll get a better sense of the consistency issues.

Topic		Replies	Views
__syncthreads() and global memory CUDA Programming and Performance	1	2447	December 1, 2008
CUDA Memory Consistency CUDA Programming and Performance	23	55488	March 8, 2007
global memory read after write CUDA Programming and Performance	4	3264	March 25, 2009
global memory access synchronous or asynchronous read/write? CUDA Programming and Performance	3	3393	May 15, 2008
Access Global memory from kernel CUDA Programming and Performance cuda	2	626	December 15, 2020
Threads in global functions, write on shared memory CUDA Programming and Performance	0	820	March 5, 2009
__threadfence_block() vs __threadfence() ? CUDA Programming and Performance	6	6613	July 13, 2022
Trying to understand memory fence function example CUDA Programming and Performance	3	14596	March 24, 2018
difference between __threadfence_block and __syncthreads CUDA Programming and Performance	17	29178	April 22, 2015
Memory Consistency Model (strong or weak) CUDA Programming and Performance	5	4525	May 4, 2009

Memory Consistency and __syncthreads()

Related topics