How to understand the bank conflict of shared_mem

Robert_Crovella · July 25, 2023, 5:25am

When you store (or load) more than 4 bytes per thread, which is like saying more than 128 bytes per warp, the GPU does not issue a single transaction. The largest transaction size is 128 bytes. If you request 16 bytes per thread, then warp wide that will be a total of 512 bytes per request (warp-wide). The GPU will break that up into 4 transactions (in that case: T0-T7 make up a transaction, T8-T15 are a transaction, and so on), each of which is 128 bytes wide. The determination of bank conflicts is made per transaction, not per request or per warp or per instruction.

The second case is identical to the first in this respect. Considering just the threads 0 to 7, or just the threads 8-15, and the transaction associated with each, there is no bank conflict.

In the 3rd case, the request across the warp will be broken up the same way: threads 0-7 will constitute one transaction. And when we look at the activity for those threads, we see that for example threads 0-3 are writing to the same column(s). So we expect 4-way bank conflicts there.

Topic		Replies	Views
dont understand bank conflicts for shared mem CUDA Programming and Performance	7	2625	March 31, 2010
Bank Conflict when each thread accesses 2 elements CUDA Programming and Performance	8	5582	July 9, 2010
Shared Memory "Bank Conflicts" I'am confused... CUDA Programming and Performance	11	3466	August 20, 2009
Help understanding bank conflicts in transpose example CUDA Programming and Performance	5	6655	February 8, 2009
do not understand bank conflicts please help CUDA Programming and Performance	7	2689	December 22, 2012
Requesting clarification for Shared Memory Bank Conflicts and Shared memory access? CUDA Programming and Performance hw , cuda	11	3984	January 23, 2024
128-bit access bank conflict CUDA Programming and Performance	11	929	March 29, 2024
shared memory bank conflicts cc 2.0 CUDA Programming and Performance	3	892	December 29, 2011
the relation between Thread Index and Shared Memory CUDA Programming and Performance	4	3236	February 14, 2009
Read bank conflict from SASS or PTX files CUDA Programming and Performance cuda , nsight , profiling	5	706	December 31, 2022

How to understand the bank conflict of shared_mem

Related topics