CUDA shared memory

for array that has even length, shared memory padding leads to reduce in bank conflicts and accordingly reduce in total execution time, but for array that has odd length, shared memory padding leads to increase in bank conflicts and accordingly increase in total execution time. Can somebody say your opinion about this phenomenon?

That depends on the access pattern

You typically need padding, if you access the shared memory array in at least two different ways. E.g. a 2-dimensional array, which you access row-wise and column-wise. The trick is that all (both) of those accesses do not lead to bank conflicts.

You can calculate the bank number of each access.

Also you can vary the access width between 32 bits, 64 bits and 128 bits.

Or you can intermix two accesses: In instruction 1 the even threads do access A, the odd threads do access B, in instruction 2 the opposite. And then each thread has to sort its data, depending on whether it is odd or even.

As striker159 said it depends on the access pattern.