Let’s say I allocate following double array in shared memory.
shared double A[16][17]; // where number 17 was chosen as the standard trick to avoid bank conflict.
So accordingly, can I assume this?
A[0][0] → bank1, bank2
A[0][1] → bank3, bank4
…
A[0][15] → bank31, bank32
A[0][16] → bank1, bank2
A[1][0] → bank3, bank4
…
A[2][0] → bank5, bank6
…
A[15][0] → bank31, bank32
Let’s say that I have 16x16 threads in a block and my shared memory assignment looks something like → A[threadIdx.x][threadIdx.y]. In Fermi, regardless what additional padding I use to avoid bank conflict (e.g. 17 as an index number), conflicts between threads from the first half of the warp and the second half of the warp (e.g. A[0][1] and A[1][0]) is unavoidable in this case.
Is this an accurate assessment of the situation?