question about the shared memory

Can any one can explain the structure of shared memory? especially about how the different type data is stored in it. when I read some code, I am always confused by some bit operation with shared memory, and the sample of Histogram64 is a good case. when I read it, I can not understand why the bit-shift operation should be done like that. In the programming guide it says that: In the case of the shared memory space, the banks are organized such that successive 32-bit words are assigned to successive banks and each bank has a bandwidth of 32 bits per two clock cycles.

the shared memory is 16K, but every bank just stores 32 bit, the total size is just (32/8)*16 byte. so how in every bank the memory is organized?

The banks are interleaved in 4-byte (32-bit) increments. So you get this:

               Bank0    Bank1   Bank2    ...  Bank15  Bank0    Bank1 ...

                4 bytes  4 bytes 4 bytes   ... 4 bytes 4 bytes  4 bytes ...

Address:       0        4       8    ...      60      64       68 ...

The banks repeat like this, let’s say you have shared char x[256], then

x[0…63] is in bank (subsequently) 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 7 7 7 7 7 8 8 8 8 9 9 9 9 10 10 10 10 11 11 11 11 12 12 12 12 13 13 13 13 14 14 14 14 15 15 15 15
x[64…127] is in bank (subsequently) 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 7 7 7 7 7 8 8 8 8 9 9 9 9 10 10 10 10 11 11 11 11 12 12 12 12 13 13 13 13 14 14 14 14 15 15 15 15
and again, repeated for 128…191 and 192…255

Thanks for your reply. Then, how to explain this:

(1)

shared int int_data[DATA_N];

//…

Int t = int_data[threadIdx.x];

For 4-byte elements and some index [i] bank number (assuming [0] is 0-th bank) is encoded in bits [3 : 0] simply due to shared memory structure.

In this case I == threadIdx.x. Let’s consider a half-warp, or a group of 16 threads, they have the same bits [31 : 4], but lower bits [3 : 0] all different. So each half-warp accesses 16 different shared memory banks, and thus there are no bank conflicts.

there are only 16 banks, where comes the bits[31 : 4], it should be [15 : 4]?

(2)

shared unsigned char char_data[DATA_N];

//…

unsigned char t = char_data[threadIdx.x];

Now it’s a bit harder: for 1-byte elements and some index [i] bank number is encoded in bits [5 : 2], due to 4-byte bank width.

Now in the same fashion I = threadidx.x and let’s consider a half-warp: lower bits [3 : 0] all different, bits [31 : 4] are constant, so each halfwarp effectively uses only 4 different banks, and there are 4-way bank conflicts. But if we shuffle threaIdx.x’s bits from [3 : 0] to [5 : 2], each thread within a halfwarp accesses it’s own bank, since we’ve “matched” thread index with bank index, encoded into array index.

I am totally confused by those point. can any one give me a answer about that with a chart?

int type is 32 bits.

Not sure what you mean by shuffling. Using [5:2] bits is equivalent to addressing 4-byte words, not bytes.