I need some help to understand something, I’m talking about the shared memory use. The following code is snipped from Dr. Dobbs article series:
global void reverseArrayBlock(int *d_out, int *d_in)
extern shared int s_data;
int inOffset = blockDim.x * blockIdx.x; int in = inOffset + threadIdx.x; s_data[blockDim.x - 1 - threadIdx.x] = d_in[in];
int outOffset = blockDim.x * (gridDim.x - 1 - blockIdx.x); int out = outOffset + threadIdx.x; d_out[out] = s_data[threadIdx.x];
and the next is the way the kernel is launched:
// launch kernel dim3 dimGrid(numBlocks); dim3 dimBlock(numThreadsPerBlock); reverseArrayBlock<<< dimGrid, dimBlock, sharedMemSize >>>( d_b, d_a );
The size of d_b is quite big compared with shared memory size.
What I understand is: the kernel inverts the order of the input array, in other words, it changes the allocation of each one of the d_a positions by storing the data in reverse order and puts the result in d_b. My question is: How it makes this?, given the fact that this is achieved using a small size array in shared memory and it’s done for every position in the array.
Other question is: what is the use of “gridDim.x”’. I don’t know what it is.