If I dynamically allocated shared memory in a kernel, and do all the type alignment correctly (I have both int
and long long int
__shared__
arrays, so I will choose a 32-bit type and be sure to have the long long int arrays point to even numbered indices of the base __extern__
allocation), can anyone just confirm for me that this will produce the same performance as statically allocated __shared__
memory? I have realized that I need to make one of my kernels launchable as 1 x 1024, 2 x 512, 4 x 256, or even 8 x 128 per streaming multiprocessor for best results. The current setup statically allocates 24 - 36 kB of __shared__
per block for up to 1024 atoms, so reducing this to accommodate the thread block size is critical for staging multiple blocks on the same SM.
And, the syntax for any one CUDA unit is simply to declare
extern __shared__ int shmem_array[];
__global__ __launch_bounds__(MAX_THREADS_PER_BLOCK, MIN_BLOCKS_PER_SM)
void myFirstKernel(const int stride) {
int* pointer_a = shmem_array[0];
int* pointer_b = shmem_array[stride];
int* pointer_c = shmem_array[2 * stride];
}
__global__ __launch_bounds__(MAX_THREADS_PER_BLOCK, MIN_BLOCKS_PER_SM)
void mySecondKernel(const int stride) {
long long int* lptr_a = shmem_array[0];
long long int* lptr_b = shmem_array[2 * stride];
int* regular_ptr_c = shmem_array[4 * stride];
}
It seems, from typing that out, that even if the baseline performance of accesses to a statically allocated array and a dynamically allocated array might be similar, it may require more registers to keep track of the pointers into the dynamically allocated array, or (if I were to store pointer_a, pointer_b, and pointer_c as __shared__
values themselves) additional accesses to __shared__
just to get the pointers to the __shared__
memory I want to access. I can compile more variants of the kernel, for the typical “large block” case and then a “small block” case at the other end of the spectrum. That would cover the vast majority of what’s relevant.
Wow, this gets quite involved… and every choice carries a degree of bookkeeping to manage.