How to elegantly handle double arrays in shared memory without inducing bank conflict?


I am trying to process some double arrays in the shared memory. I need to access each element by linear addressing, namely, I need the fisrt thread to access the first element, the second thread to access the second element, etc. It seems to me that this will induce a two-way bank conflict. How can I resolve this issue?

Please note that I need high precision for my computations, so replacing double with float is not an option for me.

You’re not stating which hardware (compute architecture) you’re developing on, which is critical information to make a meaningful performance related suggestion.

A related thread may be:

My understanding is that on Kepler you need to select 8 byte shared memory bank mode, and on later architectures (Maxwell, Kepler, Pascal, Volta) this is no longer supported or necessary.