Bank conflicts are avoidable in most CUDA computations if care is taken accessing shared memory arrays. In convolution, for example this is just a matter of padding the 2D array to a width that is not evenly divisible by the number of shared memory banks.
I am unable to understand this padding funda related to avoiding bank conflicts.
Any pointers here ??