simple global data copy using shared memory why bother shared memory when simply copy global data

baoyun · March 9, 2012, 8:34pm

Hi, All:

I saw some people use shared memory in the kernel for simple global device data copy.

See we have *din, *dout in global device memory, instead using
dout[global_id] = din[global_id];

they copy din into shared memory, and assigned the shared memory value to output.
shared[local_id] = din[global_id];
dout[global_id] = shared[local_id];

In my view, anyway we need to read and write global memory, why should bother shared memory?

Is there any performance gain to do that?

Thanks

mfatica · March 9, 2012, 8:50pm

You are correct , there is no gain in this case (possibly even a slow down).

baoyun · March 9, 2012, 8:57pm

If we do simple scale multiplication, dout[global_id] = din[global_id]*global_id.
do we gain anything using shared memory? I guess still not.

njuffa · March 9, 2012, 9:07pm

The main uses of shared memory are:

(1) software controlled cache; obviously this only make sense if there is data re-use
(2) passing (or sharing) data between threads in the same thread block (thus the name)
(3) on the fly re-layout of data to maximize global memory coalescing (for example, block-wise transpose)

None of these applies to scaling a vector in global memory, which is a simple streaming operation. So use of shared memory is not needed for that (and the overhead may hurt performance, as mfatica poined out).

baoyun · March 9, 2012, 9:44pm

thanks a lot.

Topic		Replies	Views
Correct Use of Shared Memory? CUDA Programming and Performance	1	712	January 6, 2010
optimization shared memory fail major speed using shared memory in detriment of global memory CUDA Programming and Performance	3	3667	March 31, 2011
Assigning from shared to global memory Question about global memory and assigning complex statements CUDA Programming and Performance	3	2393	July 31, 2009
Shared memory vs global memory CUDA Programming and Performance	6	3442	April 30, 2007
Device memory VS Shared memory CUDA Programming and Performance	4	4109	September 22, 2008
copying to shared block mem CUDA Programming and Performance	11	4168	April 6, 2008
Reduction: shared VS global memory CUDA Programming and Performance	4	7716	June 1, 2008
Global memory caching CUDA Programming and Performance	6	1050	April 17, 2014
Shared Memory question CUDA Programming and Performance	5	2885	November 25, 2016
CUDA: Using shared memory between different kernels.. CUDA Programming and Performance	4	16187	July 21, 2017

simple global data copy using shared memory why bother shared memory when simply copy global data

Related topics