Correct Use of Shared Memory?

danuk · January 6, 2010, 12:06pm

I don’t see any reason why using shared memory will speed up a program if the value is only retrieved from global memory once. Is there any reason to do this?

Here’s a contrived example, I have a kernel which takes a value and multiplies it by 5 and puts it into another value, I am presuming this is the fastest way of achieving this:

__global__ void kernel(float *in, float* out)

{

	unsigned int idx = blockDim.x * blockIdx.x + threadIdx.x;

	out[idx] = 5.0f * in[idx];

}

avidday · January 6, 2010, 12:18pm

You are absolutely right. In that example, global memory reads should be fully coalesced and there would be no advantage to using shared memory. However, consider a only slightly different variant of the same idea:

_global__ void kernel(float *in, float* out)

{

	unsigned int idx = blockDim.x * blockIdx.x + threadIdx.x;

	out[idx] = 5.0f * in[idx+1];

}

That version can benefit enormously from the used of shared memory, particularly on compute 1.0/1.1 capable devices.

Topic		Replies	Views
Efficient way of doing? CUDA Programming and Performance	4	8123	July 14, 2010
Shared memory doubt CUDA Programming and Performance	5	4595	June 11, 2008
CUDA: Using shared memory between different kernels.. CUDA Programming and Performance	4	16186	July 21, 2017
about shared memory's contribution to performance when global memory access is coalesced CUDA Programming and Performance	0	597	July 12, 2011
access speed of shared memory and global memory CUDA Programming and Performance	1	1070	August 6, 2009
simple global data copy using shared memory why bother shared memory when simply copy global data CUDA Programming and Performance	4	1542	March 9, 2012
optimization shared memory fail major speed using shared memory in detriment of global memory CUDA Programming and Performance	3	3667	March 31, 2011
General Shared Memory Question CUDA Programming and Performance	5	6611	March 4, 2010
Assigning from shared to global memory Question about global memory and assigning complex statements CUDA Programming and Performance	3	2393	July 31, 2009
How to efficiently use shared memory? CUDA Programming and Performance	2	1163	September 29, 2015

Correct Use of Shared Memory?

Related topics