Shared memory bank conflicts with byte arrays

NeedWisdom · January 15, 2011, 12:19am

Forum,

It appears that shared memory is arranged as interleaved 32bit words, each belong to a different bank.
Suppose I define a byte array as shared memory. Does that mean the first four bytes belong to one bank and the next to another and so on…?
What does that mean for avoiding memory bank conflicts say for copying one shared memory buffer to another in parallel?

tera · January 15, 2011, 3:57am

Yes.

Each thread should copy four (consecutive, aligned) bytes at once.

SPWorley · January 15, 2011, 7:04am

On Fermi, there’s no bank conflict on reading byte arrays since it supports multibroadcast shared memory.

But on GT200 and G80, you’ll get conflicts and as Tera says, it’s more efficient to have each thread copy a word.

But if you’re just COPYING the byte array you should use the one-word-per-thread method anyway just for speed even on Fermi.

devnglee · April 19, 2017, 12:25am

There are some articles out there saying that accessing byte arrays cause bank conflict. But in CUDA C PROGRAMMING GUIDE (v8.0.61) G3.3. Shared Memory (http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#shared-memory-2-x), it says:

A shared memory request for a warp does not generate a bank conflict between two threads that access any address within the same 32-bit word (even though the two addresses fall in the same bank): In that case, for read accesses, the word is broadcast to the requesting threads (multiple words can be broadcast in a single transaction) and for write accesses, each address is written by only one of the threads (which thread performs the write is undefined).

This means, in particular, that there are no bank conflicts if an array of char is accessed as follows, for example:
extern __shared__ char shared[];
char data = shared[BaseIndex + tid];

This seems to apply to compute capability >= 2.x.

So am I correct in concluding that accessing byte arrays in a shared memory does cause bank conflict in compute capability 1.x (Tesla), and does not cause for modern GPUs with compute capability >= 2.x (Fermi, Kepler, Maxwell, and Pascal)?

Robert_Crovella · April 19, 2017, 12:29am

Honestly, who cares about cc 1.x?

Yes, for cc2.x and higher, two threads that access bytes in the same location will not cause bank conflicts, effectively due to the broadcast mechanism.

Topic		Replies	Views
bank conflict in cuda's parallel prefix scan GPU-Accelerated Libraries	1	1885	February 12, 2016
Shared memory with compute capability 3.x (in 32-bit mode) or compute capability 5.x and 6.x CUDA Programming and Performance	5	973	November 17, 2017
Shared Memory "Bank Conflicts" I'am confused... CUDA Programming and Performance	11	3457	August 20, 2009
dont understand bank conflicts for shared mem CUDA Programming and Performance	7	2611	March 31, 2010
shared memory bank conflicts cc 2.0 CUDA Programming and Performance	3	892	December 29, 2011
Understanding bank conflicts in shared memory (fermi) CUDA Programming and Performance	4	11533	August 16, 2010
Does every thread block have its own 32 shared memory banks? CUDA Programming and Performance cuda	8	1421	February 6, 2023
Bytes in shared memory CUDA Programming and Performance	8	2983	April 19, 2017
the relation between Thread Index and Shared Memory CUDA Programming and Performance	4	3236	February 14, 2009
shared memory banks CUDA Programming and Performance	7	2539	November 23, 2008

Shared memory bank conflicts with byte arrays

Related topics