Can threads from different warps access shared memory at the same time?

Curefab · April 22, 2024, 10:19pm

Cuda SMs comprise 4 Partitions. Those have independent arithmetic units=cores.
Warps running on a SM are each assigned to a specific partition.
Each partition can schedule an instruction every clock cycle.

Shared memory is accessed over the LSU. It is a single resource shared by all 4 partitions.
Typical speeds from the last few architecture generations are mostly 1 (up to 2 cycles) per access / SM. There is no time-sharing (not multiple ports for each SM partition).

See also: How does the LSU (Load/Store Unit) execute Load/Store instructions in the Ampere architecture? - #8 by Greg

Topic		Replies	Views
shared memory accesses for different compute capabilities CUDA Programming and Performance	2	2840	July 29, 2011
Bytes in shared memory CUDA Programming and Performance	8	2997	April 19, 2017
Conflict in shared memory CUDA Programming and Performance	5	5808	November 16, 2010
Shared memory with compute capability 3.x (in 32-bit mode) or compute capability 5.x and 6.x CUDA Programming and Performance	5	973	November 17, 2017
Requesting clarification for Non contiguous shared memory access by threads of a warp with no bank conflicts CUDA Programming and Performance hw , cuda	5	380	February 21, 2024
Accessing same global memory address within warps CUDA Programming and Performance	4	4018	October 24, 2018
the relation between Thread Index and Shared Memory CUDA Programming and Performance	4	3236	February 14, 2009
coalesced access and hardware Load/Store units CUDA Programming and Performance	4	3026	July 6, 2017
Beginner's question about concurrent warp execution. CUDA Programming and Performance	3	2413	July 4, 2019
Shared Memory Bank Conflict Clarification CUDA Programming and Performance	2	772	April 16, 2011

Can threads from different warps access shared memory at the same time?

Related topics