Copying data from global memory to shared memory by each thread

Make each thread copy a chunk of your array.

__global__ testkernel(float *gArray)

{

  __shared__ float sArray[256];

if (threadIdx.x < 256)

  {

	  sArray[threadIdx.x] = gArray[threadIdx.x];

  }

  __syncthreads();// wait for each thread to copy its elemenet

  ...

}

If you have less than 256 threads, each of them should be copying a few elements of your array:

...

 sArray[i] = gArray[i];

 sArray[i + THREADS_NUM] = gArray[i + THREADS_NUM];

 ...
3 Likes