Does every thread block have its own 32 shared memory banks?

kind of stupid question but I’d assume this is the case

Shared memory is a per-sm resource. Thread blocks on different SMs are using different physical resources. Thread blocks running on the same SM are using the same physical resource, but different sections of it, normally.

Viewed from a threadblock perspective, shared memory is a logical-per-threadblock resource. Each threadblock has its own logical copy/view of shared memory.

The bank nature of shared memory arises from the physical structure. Shared memory accesses are always subject to bank considerations.

These comments don’t take into account hopper threadblock clusters. Some statements above would have to be modified somewhat to have that architecture in view.

sorry i still didn’t quite understand

thread blocks can define various shared memory sizes depending on the need, right?
so does the Number of shared memory banks: 32 refer to the banks per shared memory per thread block, regardless of the size, or is that the number of banks that every SM maps its available shared memory to, or is that the global number of banks?

Yes. However the size of the shared memory allocation/definition doesn’t impact in any way the interpretation of banks. The first 32-bit location in the shared memory definition will correspond to bank 0, the second will correspond to bank 1, and so on, up to 31. After that the bank structure “wraps”. The location at 32-bit address 32 corresponds to bank 0, address 33 corresponds to bank 1, etc.

I’m not sure what that juxtaposition of words means.

banks, the number of banks, bank ordering, and bank interpretation, are not affected by the size of the shared memory allocation or definition.

It might be helpful if you have a solid understanding of how to apply shared memory principles from a programmer’s perspective. If you don’t have that already, I’d recommend unit 2 (basics) and unit 4 (banks/bank conflicts) in this online training series. There are numerous forum questions that touch on the topic of banks and bank conflicts as well, here is one example.

oh ok, i read the slides of unit 2 and 4, so if i understand correctly, bank conflicts only arise between different threads in the same warp if the access occurs during the same instruction.
why can’t different instructions from different warps on the same SM access the shared memory through the same bank at the same time, causing a conflict due to 2 accesses (on a different word) being made on that bank at the same time
or does the warp scheduler always schedule instructions in a way that this doesn’t happen

So what are banks exactly? In my understanding, the banks are a physical piece of hardware through which the memory accesses are performed? If yes, I was wondering whether there are 32 banks per SM, or 32 banks for the entire GPU.

since each SM has one block of shared memory that it allocates to thread blocks executing on it, is this mapping always only aligned to the base of this contiguous shared memory of the SM, or does it reset for every allocated block, so for every thread blocks’ shared memory, the bank starts at 0 again for the base address of its block?

though due to this I’d think that this mapping starts at the base of the entire SM’s shared memory, not just the per thread block allocated block of shared memory

my post was removed, hopefully it is visible now

I probably can’t give an exact answer to such a question. From a programmer’s perspective, a bank is a characteristic of physical shared memory that affects the way one can achieve highest bandwidth. From a programmer’s perspective I usually start, as I did in unit 4, by suggesting one view shared memory as a 2D array and treat the columns as banks. (I’m summarizing here. There are important details.) The remaining concept is that a non-bank-conflicted access consists of only one item per column. Having N items out of a single access appear in a single column results in N-way bank-conflicted access/serialization.

I find that suffices for programming work. I can’t provide any exact definition beyond that. banks are not considered device-wide, and can be fully understood and exploited using the programmer’s model of shared memory. The bank activity of one shared memory logical instance does not affect the bank activity of another shared memory logical instance. If you like, you can say there are 32 banks per SM.

For allocated blocks that belong to separate threadblocks it does not matter. Banks are a logical template, there is nothing peculiar about bank 0 that makes it somehow different than any other bank. And the bank structure is repeating, as you move linearly through memory. You could just as easily say bank numbering begins at 4, goes up through 31, then starts over at 0, goes up through 3. And then that pattern repeats, as you move linearly through memory. And nothing about the concepts or analysis would change.

If you are asking about allocated blocks that belong to the same threadblock, I suggest giving a precise example of what you are referring to. However, I’m confident the ground we’ve already covered is sufficient to decode that space.

2 Likes

That really clears things up, thanks for the elaborate answer.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.