Transfer a one-dimensional array saved by rows-major from global memory to shared memory

luca.andaloro · July 1, 2021, 9:40am

Hi, I’m new to Cuda and I have a question (I think quite simple), should I transfer a one-dimensional (float) array, which contains a saved image by lines, from global memory to shared memory. For now I have written a possible code but I really think it is not efficient because it is executed by only one threads per block. How could I have all the threads in the block do this? The blocks are two-dimensional (32,32) and the grid is made up of N blocks (with N according to the size of the image).

__shared__ float s_template[4800];
	 if (threadIdx.x == 0) {
        for (int j = 0;j < Th;j++) {
            for (int t = 0;t < Tw;t++) {
                s_template[t+j * Tw] = T[t+j * Tw];
            }
        }
    }
	
	__syncthreads();

Robert_Crovella · July 1, 2021, 1:20pm

You would need to modify that example for a 2D threadblock, something like this:

#define SSIZE 2592

__shared__ float TMshared[SSIZE]; 

  int lidx = threadIdx.x + blockDim.x*threadIdx.y;
  while (lidx < SSIZE){
    TMShared[lidx] = TM[lidx];
    lidx += blockDim.x*blockDim.y;}

__syncthreads();

Each block would get a copy of the same data in its shared memory (both for my code and yours).

Topic		Replies	Views
From Global to Shared Copy some data from Global mem to Shared mem CUDA Programming and Performance	2	3366	November 25, 2011
Copying data into shared memory CUDA Programming and Performance	9	3770	July 1, 2009
shared memory using shared memory CUDA Programming and Performance	2	5545	February 25, 2012
memcpy equivalent for global memory to shared memo CUDA Programming and Performance	5	9254	November 12, 2007
moving data between Device Global to Device Shared CUDA Programming and Performance	7	5415	February 12, 2009
Shared -> Global Memory CUDA Programming and Performance	1	1142	November 6, 2008
Using shared Memory CUDA Programming and Performance	3	4893	March 11, 2012
how to use shared memory CUDA Programming and Performance	6	7721	September 5, 2010
Shared memory vs global memory CUDA Programming and Performance	6	3466	April 30, 2007
GM2=GM1 is faster than "SM=GM1; GM2=SM;" ? memory access time CUDA Programming and Performance	10	5404	April 19, 2007

Transfer a one-dimensional array saved by rows-major from global memory to shared memory

Related topics