Are load and store operations in shared memory atomic?

pierre.talbot · June 20, 2022, 11:13am

I’m trying to figure out whether load and store operations on primitive types are atomics when we load/store from shared memory in CUDA.

On the one hand, it seems that any load/store is compiled to the PTX instruction ld.weak.shared.cta which does not enforce atomicity. But on the other hand, it is said in the manual that loads are serialized (9.2.3.1):

However, if multiple addresses of a memory request map to the same memory bank, the accesses are serialized

which hints to load/store atomicity “per-default” in shared memory. Thus, would the instructions ld.weak.shared.cta and ld.relaxed.shared.cta have the same effect?
Or is it an information the compiler needs anyway to avoid optimizing away load and store?

More generally, supposing variables are properly aligned, would __shared__ int and __shared__ cuda::atomic<int, cuda::thread_scope_block> provide the same guarantees (when considering only load and store operations)?

Bonus (relevant) question: with a primitive data type properly aligned, stored in global memory, accessed by threads from a single block, are __device__ int and __device__ cuda::atomic<int, cuda::thread_scope_block> equivalent in term of atomicity of load/store operations?

Thanks for any insight.

Robert_Crovella · June 20, 2022, 1:33pm

Topic		Replies	Views
Atomic operation in shared memory CUDA Programming and Performance	1	3817	August 12, 2008
Performance of Atomic operations CUDA Programming and Performance	2	2683	December 17, 2008
Are vload and vstore atomic? CUDA Programming and Performance	2	1771	January 10, 2010
How's atomic operations in CUDA implemented? CUDA Programming and Performance cuda , kernel , programming	8	3127	March 26, 2024
Memory Reading and Atomic Operations CUDA Programming and Performance	3	33	December 20, 2024
shared memory atomics vs volatile does volatile eliminate the need for atomics in shared mem? CUDA Programming and Performance	1	2438	March 24, 2012
Useful Arbitrary Atomic Operation Hack CUDA Programming and Performance	0	10061	July 20, 2008
Tell me about atomics with mapped/zero-copy host memory CUDA Programming and Performance	1	1055	May 4, 2009
Which write operations are atomic in CUDA? CUDA Programming and Performance	6	3201	October 8, 2017
atomicLoad in CUDA through PTX ISA CUDA Programming and Performance	5	1543	August 7, 2017

Are load and store operations in shared memory atomic?

Related topics