I am writing some CUDA code, where each thread within a block needs to access ALL values in the shared memory in order to do a computation for that thread. in my case, the shared memory is made up of a bunch of arrays.
So, the shared memory size AND shared memory indexing has nothing to do with the thread index; all of the data in the shared memory will be accessed for each thread. I have a piece of code as shown below.
The question that I have: IS the shared memory initialized every time a thread runs, or only once per block?
What I want to have happen is to initialize the shared memory only once per block, and all threads to read from that shared memory. Does the code below do this or does it initialize the shared mem once per thread?
Thanks,
- Alex
CODE BELOW
#############
extern shared int array;
global kernel_func (…) {
// Initialize the shared memory
float* shared_d_thresholds = (float*) array; // 6 floats = 24 bytes
int3* shared_d_KernelDimsCenter = (int3*)&shared_d_thresholds[24]; // 1 int3 = 12 bytes
int3* shared_d_KernelDims = (int3*) &shared_d_KernelDimsCenter[12]; // 1 int3 = 12 bytes
int4* shared_d_signalMapSizes = (int4*) &shared_d_KernelDims[12]; // 1 int4 = 16 bytes
int2* shared_d_AADims = (int2*) &shared_d_signalMapSizes[16]; // 1 int2 = 8 bytes
int* shared_d_numResultingPoints = (int*) &shared_d_AADims[8]; // 1 in = 4 bytes
float*** shared_d_KernelWeights = (float***) &shared_d_numResultingPoints[4]; // rest of shared memory
// Fill the shared memory with data from global device memory...
.
.
shared_d_KernelDimsCenter->x = d_KernelDimsCenter->x;
shared_d_KernelDimsCenter->y = d_KernelDimsCenter->y;
shared_d_KernelDimsCenter->z = d_KernelDimsCenter->z;
.
// Use indexing within large array to assign appropiate data to crunch for current thread
}
void entry_point_to_CUDA {
init …
// Setup the kernel excution configuration
dim3 dimGrid ...
dim3 dimBlock ...
size_t Ns ...
printf("Allocating %d bytes of shared memory per thread block\n", int(Ns));
kernel_func<<<dimGrid,dimBlock,Ns>>>(...);
}
############