64-bit shared memory with minimal bank conflict?

I have an app that must use 64-bit (double) shared memory. My older books say that this causes serious bank conflicts, and one should go so far as to split the doubles into two 32-bit parts! But a recent reference says that SM 3.x hardware “has a mode that lets the developer increase bank size to 64 bits to avoid bank conflict”. But I can’t find anything about HOW to tell the hardware to do this. Do I have to request this some way? Or does the compiler tell the hardware? Thanks!

You have to set the bank width via cudaDeviceSetSharedMemConfig or cudaFuncSetSharedMemConfig.

Note that 64 bit shared memory banks have been removed in later device generations as they haven’t proven as beneficial as hoped. So this only makes a difference if you are using a Kepler device.

Thanks for the answer! Too bad about dropping this, though. I do heavy fpt computation, all in double, so this is a bad hit for me. Oh well. Thanks again!

Tim

It’s not difficult to avoid bank conflicts when using 64-bit quantities. Conceptually, it is no different than avoiding bank conflicts with 32-bit quantities. Developers who use double should not feel that bank conflicts are inevitable, with or without this kepler feature.

In fact, you’ve given no indication that bank conflicts may be an issue at all for your code. Such information is trivial to ascertain with one of the profilers. And if the profiler confirms that bank conflicts are an issue, the method to address it is likely no different than in 32-bit mode. It’s quite likely, in fact, that this feature would not have affected such bank conflicts, anyway. In retrospect it was difficult to find example codes where 8 byte bank mode actually made a difference.

The simple use of double as a computation type did not indicate the likelihood of hitting bank conflicts, even in 32-bit-bank-mode.

I concur with txbob. In working with Kepler devices over a number of year, I never encountered a case where switching the shared memory bank width resulted in any noticeable application-level performance difference. I also recall other people finding essentially no performance difference and asking me about this observation. So presumably this was a common experience which ultimately led to the removal of the configurable bank width feature in post-Kepler architectures.