From Global to Shared Copy some data from Global mem to Shared mem

Hi, you all.

I want to copy some data of an array in global memory to the shared memory of the streaming multiprocesor.
The thing is that the data that I want to copy is sparse IE:

array in global mem
[0 1 2 3 4 5 6 7 8]

and for simplify the example I’m just using 3 blocks
I want to copy
[0 3 6] in the shared mem of the block 0
[1 4 7] in the shared mem of the block 1
[2 5 8] in the shared mem of the block 2

is there a simple way to do it?
or I have to access all the array in global mem and copy one by one element in Shared Mem?
And how do I do that?

I’m new in CUDA please help me.

This depends on how you layout your grid and block dim3 and so how the indices are calculated.

With unique block and thread indices you can use a strided index to initialize your shared memory array.

#define SMEM_LENGTH 3;

__global__ void smemKernel(T* valuesIn_g)


	//define block and thread ids, use long long int if necessary!

	const unsigned int blockId = ... //unique block Id, depends on your grid dimension!

	const unsigned int threadId = blockId * blockDim.x + threadIdx.x; //for simple x-usage of a block

	//declare shared memory array

	__shared__ values_s T[SMEM_LENGTH];

	//initialize shared memory array! 

	//make sure threadId is smaller then the length of the input data!

	if(threadId < SMEM_LENGTH)

		values_s[threadId] = valuesIn_g[threadId];


	//... rest of your code


tnx it really help me!

I’m changing all my code to work in Shared Mem :D