Can any one can explain the structure of shared memory? especially about how the different type data is stored in it. when I read some code, I am always confused by some bit operation with shared memory, and the sample of Histogram64 is a good case. when I read it, I can not understand why the bit-shift operation should be done like that. In the programming guide it says that: In the case of the shared memory space, the banks are organized such that successive 32-bit words are assigned to successive banks and each bank has a bandwidth of 32 bits per two clock cycles.
the shared memory is 16K, but every bank just stores 32 bit, the total size is just (32/8)*16 byte. so how in every bank the memory is organized?
For 4-byte elements and some index [i] bank number (assuming [0] is 0-th bank) is encoded in bits [3 : 0] simply due to shared memory structure.
In this case I == threadIdx.x. Let’s consider a half-warp, or a group of 16 threads, they have the same bits [31 : 4], but lower bits [3 : 0] all different. So each half-warp accesses 16 different shared memory banks, and thus there are no bank conflicts.
there are only 16 banks, where comes the bits[31 : 4], it should be [15 : 4]?
(2)
shared unsigned char char_data[DATA_N];
//…
unsigned char t = char_data[threadIdx.x];
Now it’s a bit harder: for 1-byte elements and some index [i] bank number is encoded in bits [5 : 2], due to 4-byte bank width.
Now in the same fashion I = threadidx.x and let’s consider a half-warp: lower bits [3 : 0] all different, bits [31 : 4] are constant, so each halfwarp effectively uses only 4 different banks, and there are 4-way bank conflicts. But if we shuffle threaIdx.x’s bits from [3 : 0] to [5 : 2], each thread within a halfwarp accesses it’s own bank, since we’ve “matched” thread index with bank index, encoded into array index.
I am totally confused by those point. can any one give me a answer about that with a chart?