Efficiently loading data in the shared memory

moein.mfh · February 15, 2021, 12:08pm

Hi,
I have a question about loading data from global memory to the shared memory. My grid is defined as follows:
dim3 block(1024,1);
dim3 grid((256 * 256/ block.x));

As you can see, the total number of threads in my Kernel is 256256=65536. Each of these threads needs to access a memory with a size of 6404=2560Bytes =(NumberOfElements*sizeof(int)). As this is going to happen repeatedly, I want to load this global memory to the shared memory. How can i efficiently make this shared memory available to all the threads considering the fact that the shared memory life time is block dependent.
I used the following code in my kernel before any processing, but at the end, my processing time was even higher compared to using the global memory. So, I’m looking for another way.

int TID = threadIdx.y * blockDim.x + threadIdx.x;
if (TID < 640 )
{
SharedMemory[TID] = RfData[XXX];
}
__syncthreads();

Thanks in advance.

Topic		Replies	Views
Copying data from global memory to shared memory by each thread CUDA Programming and Performance	6	17226	January 7, 2022
Shared memory vs global memory CUDA Programming and Performance	6	3508	April 30, 2007
Shared Memory question CUDA Programming and Performance	5	2950	November 25, 2016
General Shared Memory Question CUDA Programming and Performance	5	6679	March 4, 2010
memcpy equivalent for global memory to shared memo CUDA Programming and Performance	5	9313	November 12, 2007
copying to shared block mem CUDA Programming and Performance	11	4286	April 6, 2008
Question regarding transfer from global to shared memory CUDA Programming and Performance	5	6052	November 27, 2010
Transfer a one-dimensional array saved by rows-major from global memory to shared memory CUDA Programming and Performance cuda	1	497	July 1, 2021
Data load question CUDA Programming and Performance	3	84	December 18, 2024
memory confusion how big is local/shared/global memory? CUDA Programming and Performance	6	3498	January 20, 2009

Efficiently loading data in the shared memory

Related topics