you can access only 32 bits per bank on shared memory despite the fact a bank is 1ko ?


It says in the Appendix A of the programming guide (A.1.1) that “the amount of shared memory available per multiprocessor is 16KB organized into 16 banks.”
So each bank has 1KB of memory space available, right ?

At in the programming guide, it says : “In the case of the shared memory space, the banks are organized such that successive 32-bit words are assigned to successive banks and each bank has a bandwidth of 32 bits per two clock cycles.”

Does it mean that only 32 bits = 4 Bytes among the 1KB available per bank is used ?

Thanks for your concern and sorry if this question has already been discussed. I couldn’t find any related topic at first search.


The first 16 * 4 bytes of shared memory are stored at offset 0 of each bank.

The next 16 * 4 bytes at offset 4

and so on.

So the memory location for a (32bit word) offset X is

bank = X % 16

offset in bank = int(X / 16)

I wonder if this was explicitly discussed anywhere in the programming guide or elsewhere…
But anyway, thank you a lot for clearing this up !