Can memory read and write operations overlap?

y_t_chen · January 12, 2024, 2:37pm

I know in CUDA programming, memory reads at different levels can overlap. For example, data transfers from Global memory to Shared memory and from Shared memory to registers can overlap. But can read and write operations at the same memory level overlap, such as overlapping reads and writes to shared memory?

Thank you very much!

Robert_Crovella · January 18, 2024, 4:56pm

All operations on a GPU have latency. That is, it takes multiple clock cycles for any transaction to complete, although new requests can be made typically every clock cycle or every other clock cycle.

In that context, nearly anything can overlap.

If you mean “can a shared load and a shared store be issued to shared memory in the same clock cycle in the same SM”, that is not well-specified by NVIDIA, and has to do with bandwidth to share memory. It is not much different than asking if two reads can be issued. You would need to study microbenchmarking papers to see what the results are likely to be. However my mental model is that shared memory typically has a bandwidth such that it can accept full rate (i.e. one request per clock) from a single sub-partition in a single SM. that may not be accurate in all cases. I do not assume, for example, that shared memory can accept requests simultaneously from all 4 sub partitions in a modern GPU SM. My expectation is that in some unspecified way, those requests would get serialized, to some degree.

Topic		Replies	Views
Concurrency of Global Memory Operations CUDA Programming and Performance	1	570	February 17, 2011
Shared memory - full duplex support? CUDA Programming and Performance	2	39	September 23, 2024
What is the detail of memory operations? CUDA Programming and Performance	2	263	April 20, 2024
Shared memory bandwidth CUDA Programming and Performance	10	8509	November 10, 2007
memory copy overlap CUDA Programming and Performance	7	14724	March 29, 2008
Shared memory : shared access CUDA Programming and Performance	4	2021	July 21, 2008
Read/write out of shared memory CUDA Programming and Performance cuda , programming	3	50	October 9, 2024
Can threads from different warps access shared memory at the same time? CUDA Programming and Performance	4	616	April 22, 2024
Is it possible to overlap memory access and computation inside the same kernel? CUDA Programming and Performance	5	1015	September 30, 2022
Why multi-stage can accelerate GEMM? CUDA Programming and Performance	4	214	August 5, 2024

Can memory read and write operations overlap?

Related topics