Sorry for this newbie question but I couldn’t locate any place where this has been answered clearly. I need to have some arrays in the shared memory which are accessed by each thread executing in a block. As the host can’t directly load the arrays into the shared memory, should I have something like this at the very beginning of the kernel?
// gArray points to an array in the global memory passed as a pointer to the kernel
// sArray is the shared array
global testkernel(float *gArray)
shared float sArray;
for(i = 0; i < 256; i++)
sArray[i] = gArray[i];
If I have this in the kernel will this not be executed for every single thread in the block copying the same data over to the shared memory? I can’t have each thread copying different portions of the array as I need the entire array to proceed with the computation. As the array would be a read-only I can perhaps use texture memory and I can also use syncthreads() to wait for all threads to finish copying different portions of the array but is there any other way to load the data into the shared memory once for all the threads?
Sorry if I’m missing something here.