The book I am studying about cuda has important coverage on shared memory content about impacting performance. Terms include i.e. coalesced memory access, which leads to bank structure in shared memory.
But problem is book is already old and covers Fermi/Kepler architecture when speaking about banks which I dont even have.
This leads to a questoin, is there a way to get more detailed information about shared memory using any tools? I decided to post here because nsight has become de-facto gpu query tool I believe.
The stuff I am looking for specifically but not limited to:
- bank widths (either 32-bit or 64-bits)
- No. of banks
- shared memory address bank’s two modes: i.e 64-bit or 32-bit
Because book gives coverage on those old GPUs and examples tied to those architectures, it is hard to validate my understanding of code example and I would like to modify/adjust the code as necessary based on new architecture.
64 bit shared memory mode was dropped for architectures greater than Kepler. You can find relevant details for this and subsequent GPUs here.
thank you, I think i got at least partial answers:
Shared memory has 32 banks that are organized such that successive 32-bit words map to successive banks. Each bank has a bandwidth of 32 bits per clock cycle.
A shared memory request for a warp does not generate a bank conflict between two threads that access any address within the same 32-bit word (even though the two addresses fall in the same bank). In that case, for read accesses, the word is broadcast to the requesting threads and for write accesses, each address is written by only one of the threads (which thread performs the write is undefined).
Number of shared memory banks : 32
Not sure on bank width. Is it same as 64-bit or 32-bit operating mode?
No. The bank width is the maximum word size per bank, which in the case of all archs >=4.0, is 32bits.