About shared memory banks

I have a float array A[5].
The total size in Bytes will be 5 * 4 = 20B.
In compute capability 5.2 we have 8B/bank (and total 32banks and 256B)

  1. How it is calculated the total size of shared memory per SM (in compute capabilty 5.2 is ~98KB)?
  2. How are distributed the bytes of each element of array?
    e.g.
    the A[0] and A[1] will store in bank0
    the A[2] and A[3] will store in bank1
    etc
    or
    the 0 Byte of A[0] in bank0
    the 1st Byte of A[0] in bank1
    the 2nd Byte of A[0] in bank2
    etc
    or something else?

Thanks to all.

I don’t think so. See here. Compute capability 5.x is 4 bytes/bank.

For:

__shared__ float A[size];

A[0] is in bank 0.
A[1] is in bank 1.
A[2] is in bank 2.

A[31] is in bank 31.
A[32] is in bank 0.
A[33] is in bank 1.
etc.

Thanks for your reply.
In this case if A is float with 4B size
and we have 32bit word per bank (4B/bank)
and A elements distributed like above
so
if thread0 and thread1 have access to A[0]
then we havent conflict.

Is this true?

Yes, true. Two threads in a warp accessing the same bank may have a bank conflict. However, two or more threads accessing the same location are covered by the broadcast rule. The value will be broadcast to all threads requesting it, with no conflicts arising from the broadcast.

1 Like

Thank you for your answer.

Another question:
In the compute capability 5.2 the shared memory has 64KB.
This means that 64KB divide in 32 banks?
e.g
The word index will be for
Bank0: 0, 32, 64, 96, 128, …
Bank1: 1, 33, 65, 97, 129, …

Bank31: 31, 63, 95, 127, 159, … 15999

Yes, correct. (Well, I don’t know about the 15999 number, but the rest of it is the way I would describe it.)

Thanks again.