Shared memory layout of multiple static shared memories declaration

tnwilly · November 18, 2024, 7:20am

Hi Experts,

I am optimizing my kernel function using padding to reduce the bank conflicts.

I’d like to know the layout of multiple static shared memories declaration in a kernel.

If there’s only one shared memory in a kernel:

__global__ void exampleA(...) {
    __shared__ cuFloatComplex shmem[6][8];
    ....
}

According to the official programming documentation, I think the shared memory will be like:

If there are multiple static shared memories like:

__global__ void exampleB(...) {
    __shared__ cuFloatComplex shmemA[6][8];
    __shared__ cuFloatComplex shmemB[7 * 6][8];
    ....
}

I suspect that two shared memory arrays will be cyclically assigned to the 32 banks based on their declaration order.

Is it correct? Thanks!

Curefab · November 18, 2024, 11:35am

There is one shared memory.
Normally it does not matter, as warp either accesses shmemA or shmemB at the same time.
But as far as I know, they are just put one after the other with the basic alignment of the type (e.g. 8 bytes).

If you use a single pointers, which can point to an element of either (shmemA or shmemB), it gets relevant.

You can create a struct, in which you define your padding.

Or you define your complex as two floats.

In your case, you could want to access 16 complex numbers, or 16 odd and 16 even banks at the same time.

You can view the shared memory space in a debugger or print out the pointer addresses.

Topic		Replies	Views
how does static shared memory get laid out across the banks? CUDA Programming and Performance	4	728	September 24, 2018
Using Shared Memory in CUDA C/C++ Technical Blog	36	2397	October 8, 2020
Question about the shared memory CUDA Programming and Performance	2	2743	July 13, 2007
Shared memory layout Question about how arrays are organised CUDA Programming and Performance	1	1710	January 26, 2008
data-ordering in shared memory variable declarations of different types CUDA Programming and Performance	4	3589	September 9, 2010
beginner question regarding shared memory CUDA Programming and Performance	4	7044	November 16, 2009
structures in the shared memory CUDA Programming and Performance	1	8878	April 2, 2007
No clear concise data on GPU shared memory bank layout CUDA Programming and Performance	3	501	May 17, 2024
arrays or data structures in shared memory! CUDA Programming and Performance	0	3267	August 7, 2007
Odd observation about ordering of __shared__ memory arrays CUDA Programming and Performance	1	544	May 18, 2016

Shared memory layout of multiple static shared memories declaration

Related topics