Reduction: shared VS global memory

grafd · May 21, 2008, 1:41pm

Hi everyone,

in order to test the difference in speed between the shared and the global memory with CUDA I have simply taken the reduction code (from the paper of Mark Harris) and have replaced there the shared memory array by a global one given as a parameter. But this is not working. And my simple question is why not? I hope that the answer is as simple as the question :D. Just as info: I have simply allocated a device pointer of the same size as the used shared memory, removed the shared pointer and passed the global one as an additional parameter to the reduction kernel. The rest of the code is unchanged.

Thanks a lot!

JHHPC · May 21, 2008, 2:30pm

Hi,

per definition shared memory is private to a block. So the reduction scheduling possibly only schedules between the threads but not for the different blocks. So in my opinion, all your blocks write to the same location in global memory.

Johannes

wumpus · May 21, 2008, 2:49pm

You can’t just replace shared with global memory, as you don’t have the same synchronization primitives. For example, __syncthreads() does not guarantee that all your global memory writes have finished.

grafd · May 21, 2008, 3:56pm

Thanks a lot guys.

Neeraj · June 1, 2008, 12:18am

Hi,

If you just want to experiment with reduction through global memory, try the __syncblocks() construct posted previously. Its just a spin loop that idles out all the thread in a MP. The construct is illusive but fails for large blocks.

Curious to know has anybody tried the same constructs using atomic instruction in 1.1 Hardware?

Topic		Replies	Views
Shared memory and global memory containg different values CUDA Programming and Performance	0	510	February 22, 2011
comparision: shared mem <=> global mem actually no difference CUDA Programming and Performance	6	7552	July 21, 2008
Device memory VS Shared memory CUDA Programming and Performance	4	4109	September 22, 2008
using shared memory in my cuda code decreases execution time Legacy PGI Compilers	1	2531	December 12, 2016
shared memory latency CUDA Programming and Performance	7	5900	May 18, 2011
Shared memory as slow as global memory CUDA Programming and Performance	8	4371	September 5, 2016
Reduction questions(newbie-ish) CUDA Programming and Performance	7	1793	January 14, 2009
CUDA: Using shared memory between different kernels.. CUDA Programming and Performance	4	16187	July 21, 2017
Local Memory and Global Memory It is about the speed between local memory and global memory CUDA Programming and Performance	1	1030	February 7, 2012
optimization shared memory fail major speed using shared memory in detriment of global memory CUDA Programming and Performance	3	3667	March 31, 2011

Reduction: shared VS global memory

Related topics