Determine number of banks in shared memory

Shared memory is “striped” into banks. This leads to the whole issue of bank conflicts.

But how can you determine how many banks (“stripes”) exist in shared memory?

Poking around the forums, it seems that per-block shared memory is “striped” into 16 banks. But how do we know this? The threads suggesting this are a few years old. Have things changed? Is it fixed on all NVIDIA CUDA-capable cards? Is there a way to determine this from the runtime API (I don’t see it there, e.g. under cudaDeviceProp)? Is there a manual way to determine it at runtime?

Thank you.

You can find it in the Programming Guide, check Table 10 in Appendix F. Compute Capabilities on p.150.

Curiously enough, the number of banks doesn’t seem to be available in runtime via cuDeviceGetAttribute or cudaGetDeviceProperties. You can microbenchmark it out, of course.