correct. For maximum throughput to shared memory, the rule is, considering a warp-wide access, we want no more than one item per bank requested. It is not necessary that all addresses be contiguous. Shared memory generally also has the broadcast rule. That means that if there are multiple requests to the same bank, but they are also to the same location, this does not reduce efficiency. A particular location can be broadcast to multiple threads in a warp, per transaction, at no additional cost.
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Requesting clarification for Non contiguous shared memory access by threads of a warp with no bank conflicts | 5 | 409 | February 21, 2024 | |
| Uncoalesced Shared Accesses | 2 | 854 | September 6, 2023 | |
| Shared memory with compute capability 3.x (in 32-bit mode) or compute capability 5.x and 6.x | 5 | 991 | November 17, 2017 | |
| Requesting clarification for Shared Memory Bank Conflicts and Shared memory access? | 11 | 4396 | January 23, 2024 | |
| Bytes in shared memory | 8 | 3088 | April 19, 2017 | |
| How to understand the bank conflict of shared_mem | 12 | 11561 | January 16, 2025 | |
| the relation between Thread Index and Shared Memory | 4 | 3245 | February 14, 2009 | |
| questions about coalescing access coalescing access | 8 | 1997 | November 13, 2009 | |
| Coalescing global memory and avoiding shared bank conflicts Do I need to use this complex of indexin | 3 | 3209 | March 30, 2009 | |
| Accessing same global memory address within warps | 4 | 4243 | October 24, 2018 |