Hi Experts,
I am optimizing my kernel function using padding to reduce the bank conflicts.
I’d like to know the layout of multiple static shared memories declaration in a kernel.
If there’s only one shared memory in a kernel:
__global__ void exampleA(...) {
__shared__ cuFloatComplex shmem[6][8];
....
}
According to the official programming documentation, I think the shared memory will be like:
If there are multiple static shared memories like:
__global__ void exampleB(...) {
__shared__ cuFloatComplex shmemA[6][8];
__shared__ cuFloatComplex shmemB[7 * 6][8];
....
}
I suspect that two shared memory arrays will be cyclically assigned to the 32 banks based on their declaration order.
Is it correct? Thanks!