I want to copy some data of an array in global memory to the shared memory of the streaming multiprocesor.
The thing is that the data that I want to copy is sparse IE:
array in global mem
[0 1 2 3 4 5 6 7 8]
and for simplify the example I’m just using 3 blocks
I want to copy
[0 3 6] in the shared mem of the block 0
[1 4 7] in the shared mem of the block 1
[2 5 8] in the shared mem of the block 2
is there a simple way to do it?
or I have to access all the array in global mem and copy one by one element in Shared Mem?
And how do I do that?
This depends on how you layout your grid and block dim3 and so how the indices are calculated.
With unique block and thread indices you can use a strided index to initialize your shared memory array.
#define SMEM_LENGTH 3;
__global__ void smemKernel(T* valuesIn_g)
{
//define block and thread ids, use long long int if necessary!
const unsigned int blockId = ... //unique block Id, depends on your grid dimension!
const unsigned int threadId = blockId * blockDim.x + threadIdx.x; //for simple x-usage of a block
//declare shared memory array
__shared__ values_s T[SMEM_LENGTH];
//initialize shared memory array!
//make sure threadId is smaller then the length of the input data!
if(threadId < SMEM_LENGTH)
values_s[threadId] = valuesIn_g[threadId];
__synchronize();
//... rest of your code
}