Avoiding Bank Conflicts in convolution


Bank conflicts are avoidable in most CUDA computations if care is taken accessing shared memory arrays.
In convolution, for example this is just a matter of padding the 2D array to a width that is not evenly divisible by the number of shared memory banks.

I am unable to understand this padding funda related to avoiding bank conflicts.
Any pointers here ??

Thanks in advance

consider shared A[16][16],

A[i][j] belongs to bank-j for any i = 0:15

suppose you use 16 threads, then

A[threadIdx.x][0] is bank-conflict since A[0][0], A[1][0], … A[15][0] fall into bank-0

and cannnot be broadcasted.

however if you declare shared A[16][16+1]

then A[threadIdx.x][0] is not bank-conflict since

A[0][0] is bank-0

A[1][0] is bank-1

A[15][0] is bank-15

General speaking, under shared A[16][16+1], row-major index of A[i][j] is (i * 17 + j)

(i * 17 + j) = 16 * k + bank_id

or say bank_id = (i * 17 + j) mod 16

for j = 0, bank_id = (i * 17) mod 16

gcd(17,16) = 1, so A[i][0], i=0:15 is distributed into 16 bank.

Thanks for your reply!

A[1][0] is bank 1 but the statement:

A[i][j] belongs to bank-j for any i = 0:15

Says that A[1][0] is bank 0… as you said : A[ i ][ j ] belongs to bank-j for any i = 0:15

I am little bit confused!

sorry, I correct my setting, define type of matrix A is “float”,

shared float A[16][16]

and relationship between bank id and A[i][j] is shown in the following

bank 0	bank 1	bank 2	bank 3   ....   bank 15

A[0][0]   A[0][1]   A[0][2]   A[0][3]  ....   A[0][15]

A[1][0]   A[1][1]   A[1][2]   A[1][3]  ....   A[1][15]

A[2][0]   A[2][1]   A[2][2]   A[2][3]  ....   A[2][15]


A[15][0]  A[15][1]  A[15][2]  A[15][3]  ....  A[15][15]

that’s why I say A[i][j] belongs to bank-j for any i = 0:15