Can anybody explain the share memory model and banks. I’m very confused with them. I read the Programming Guide 2.2(section 220.127.116.11), but seems it doesn’t have a clear explanation.
It is said that “the banks are organized such that successive 32-bit words are assigned to successive banks and each bank has a bandwidth of 32 bits per two clock cycles.” Does it mean that the address 0, 4, 8, 12 will be in the bank 0, 1, 2, 3?
In the Programming Guide 2.2(section 18.104.22.168):
shared float shared;
float data = shared[BaseIndex + s * tid];
In this case, the threads tid and tid+n access the same bank whenever s*n is a multiple of the number of banks m or equivalently, whenever n is a multiple of m/d where d is the greatest common divisor of m and s. As a consequence, there will be no bank conflict only if half the warp size is less than or equal to m/d. For devices of compute capability 1.x, this translates to no bank conflict only if d is equal to 1, or in other words, only if s is odd since m is a power of two.
How to understand the above paragraph?
Hope your answers? Thanks in advance.