Does every thread block have its own 32 shared memory banks?

spma1 · February 5, 2023, 4:46pm

kind of stupid question but I’d assume this is the case

Robert_Crovella · February 5, 2023, 4:51pm

Shared memory is a per-sm resource. Thread blocks on different SMs are using different physical resources. Thread blocks running on the same SM are using the same physical resource, but different sections of it, normally.

Viewed from a threadblock perspective, shared memory is a logical-per-threadblock resource. Each threadblock has its own logical copy/view of shared memory.

The bank nature of shared memory arises from the physical structure. Shared memory accesses are always subject to bank considerations.

These comments don’t take into account hopper threadblock clusters. Some statements above would have to be modified somewhat to have that architecture in view.

spma1 · February 5, 2023, 4:57pm

sorry i still didn’t quite understand

thread blocks can define various shared memory sizes depending on the need, right?
so does the Number of shared memory banks: 32 refer to the banks per shared memory per thread block, regardless of the size, or is that the number of banks that every SM maps its available shared memory to, or is that the global number of banks?

Robert_Crovella · February 5, 2023, 5:54pm

Yes. However the size of the shared memory allocation/definition doesn’t impact in any way the interpretation of banks. The first 32-bit location in the shared memory definition will correspond to bank 0, the second will correspond to bank 1, and so on, up to 31. After that the bank structure “wraps”. The location at 32-bit address 32 corresponds to bank 0, address 33 corresponds to bank 1, etc.

I’m not sure what that juxtaposition of words means.

banks, the number of banks, bank ordering, and bank interpretation, are not affected by the size of the shared memory allocation or definition.

It might be helpful if you have a solid understanding of how to apply shared memory principles from a programmer’s perspective. If you don’t have that already, I’d recommend unit 2 (basics) and unit 4 (banks/bank conflicts) in this online training series. There are numerous forum questions that touch on the topic of banks and bank conflicts as well, here is one example.

spma1 · February 6, 2023, 1:23am

oh ok, i read the slides of unit 2 and 4, so if i understand correctly, bank conflicts only arise between different threads in the same warp if the access occurs during the same instruction.
why can’t different instructions from different warps on the same SM access the shared memory through the same bank at the same time, causing a conflict due to 2 accesses (on a different word) being made on that bank at the same time
or does the warp scheduler always schedule instructions in a way that this doesn’t happen

So what are banks exactly? In my understanding, the banks are a physical piece of hardware through which the memory accesses are performed? If yes, I was wondering whether there are 32 banks per SM, or 32 banks for the entire GPU.

since each SM has one block of shared memory that it allocates to thread blocks executing on it, is this mapping always only aligned to the base of this contiguous shared memory of the SM, or does it reset for every allocated block, so for every thread blocks’ shared memory, the bank starts at 0 again for the base address of its block?

though due to this I’d think that this mapping starts at the base of the entire SM’s shared memory, not just the per thread block allocated block of shared memory

spma1 · February 6, 2023, 6:19pm

my post was removed, hopefully it is visible now

Robert_Crovella · February 6, 2023, 6:37pm

I probably can’t give an exact answer to such a question. From a programmer’s perspective, a bank is a characteristic of physical shared memory that affects the way one can achieve highest bandwidth. From a programmer’s perspective I usually start, as I did in unit 4, by suggesting one view shared memory as a 2D array and treat the columns as banks. (I’m summarizing here. There are important details.) The remaining concept is that a non-bank-conflicted access consists of only one item per column. Having N items out of a single access appear in a single column results in N-way bank-conflicted access/serialization.

I find that suffices for programming work. I can’t provide any exact definition beyond that. banks are not considered device-wide, and can be fully understood and exploited using the programmer’s model of shared memory. The bank activity of one shared memory logical instance does not affect the bank activity of another shared memory logical instance. If you like, you can say there are 32 banks per SM.

For allocated blocks that belong to separate threadblocks it does not matter. Banks are a logical template, there is nothing peculiar about bank 0 that makes it somehow different than any other bank. And the bank structure is repeating, as you move linearly through memory. You could just as easily say bank numbering begins at 4, goes up through 31, then starts over at 0, goes up through 3. And then that pattern repeats, as you move linearly through memory. And nothing about the concepts or analysis would change.

If you are asking about allocated blocks that belong to the same threadblock, I suggest giving a precise example of what you are referring to. However, I’m confident the ground we’ve already covered is sufficient to decode that space.

spma1 · February 6, 2023, 10:01pm

That really clears things up, thanks for the elaborate answer.

system · February 20, 2023, 10:02pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
dont understand bank conflicts for shared mem CUDA Programming and Performance	7	2629	March 31, 2010
Shared Memory "Bank Conflicts" I'am confused... CUDA Programming and Performance	11	3467	August 20, 2009
Help understanding bank conflicts in transpose example CUDA Programming and Performance	5	6657	February 8, 2009
the relation between Thread Index and Shared Memory CUDA Programming and Performance	4	3236	February 14, 2009
How to understand the bank conflict of shared_mem CUDA Programming and Performance	12	9964	January 16, 2025
Shared memory with compute capability 3.x (in 32-bit mode) or compute capability 5.x and 6.x CUDA Programming and Performance	5	974	November 17, 2017
shared memory banks CUDA Programming and Performance	7	2540	November 23, 2008
Share memory and banks CUDA Programming and Performance	1	3243	August 5, 2009
bank conflict in cuda's parallel prefix scan GPU-Accelerated Libraries	1	1889	February 12, 2016
question about the shared memory CUDA Programming and Performance	4	3865	October 30, 2007

Does every thread block have its own 32 shared memory banks?

Related topics