How copy block to shared memory

I have data ni global memory: unsigned char* Data
And I have shared memmory: extern shared unsigned char shared1;

How I can copy Data to shared memory.

shared1[threadIdx.x] = Data[blockIdx.x*blockDim.x + threadIdx.x]

Obviously, the addressing will change based on what you actually want to copy. I show an example of a common addressing pattern.

I want copy ALL data at once.

May be cudaMemCopy… ?

How big is your Data array? Will it even fit in shared memory? There is only 16K of shared memory maximum available per block.

cudaMemcpy cannot be called on the device. And there is no magic function call that will do this for you. CUDA gives you complete control over how you read the memory into arrays, and this is a good thing because of how sensitive performance is to the memory access pattern you use.

If you need to copy more than the block size at once, you can use a sliding window technique:

for(int offset = 0 to Nblocks)

   shared1[threadIdx.x + offset*blockDim.x] = Data[start + threadIdx.x + offset*blockDim.x];

Note: my examples assume the data you are addressing follows the guidelines for fully coalesced reads in the programming guide. I.e. if your data is really stored as unsigned chars, then this won’t be a fully coalesced read. Check the programming guide for details. There is a SDK example dealing with this too. Not also that putting unsigned chars into shared memory is asking for bank conflicts.


NVM, i took ur question in a wrong way…


I’m solve this task.

union type128bit


 unsigned char a[128];


for(int f = 0; f < SHARED_MEM_STEP; f += lPitch1)


    	*(type128bit*)&shared[f + SHARED_MEM_STEP * (i+j*lPitch1) + SHARED_MEM_STEP] = *(type128bit*)&plane1[(ys1 + iy)*lPitch2 + xs1 + f];

    	*(type128bit*)&shared[f + SHARED_MEM_STEP * (i+j*lPitch1)] = *(type128bit*)&plane2[((int)y  + iy)*lPitch1 + (int)x + f];