Shared memory issues Initialization of shared memory

Jelle · August 23, 2007, 5:28pm

Hello all,

I just started using CUDA, but I bumped into a little issue.

I have a grid of blocks, and inside each block all threads access the same shared memory block to perform operations. I really need to make full use of my shared memory since I need to operate on quite large data sets. I want to be able to store a 1D array of up to 4K float values (or a bit lower to avoid coming to close to the 16 KB limit). Splitting this is not trivial because of the nature of my algorithm.

Now the following questions :

If I have more blocks than multiprocessors (which is always the case), can this cause failure of execution because the runtime tries to allocate several blocks concurrently to one multiprocessor and runs out of shared memory? I suppose the blocks will just be serialized, but am not sure.
Is there a way to initialize the shared memory? All threads within one block will add values (probably from texture memory) to the shared memory, in a non-trivial sequence (therefore requiring the syncthreads()). However, the memory should be initialized to ‘0’ before any of these operations are performed. I can not let each thread initialize a certain block of shared memory to ‘0’, since this might erase previous values.

Hope you guys can help me out a bit. Thanks in advance.

sphyraena · August 23, 2007, 6:58pm

CUDA will vary the number of blocks that are executed concurrently. If your kernel uses close to 16KB, then you can only run 1 block at a time (serialized). If your kernel uses less than 8KB, then CUDA will schedule 2 blocks to run concurrently, assuming other constraints are met (e.g. enough registers). And so forth. I guess the only time it fails to launch is when you use more than 16KB of shared memory.

Shared memory is shared only within 1 block. It is not shared between different blocks. So you cannot store values in shared memory for subsequent blocks to use.

Jelle · August 23, 2007, 7:06pm

Thanks for this information.

Maybe I did not make myself entirely clear. The shared memory is only accessed within the block itself, it should just be initialised before anything else happens. But looking a little bit more into the docs, it seems it should not be to difficult to let each thread set SHARED_MEM_SIZE/NUMBER_OF_THREADS_PER_BLOCK to zero, followed by a synchthread command.

Topic		Replies	Views
Kernel Execution issues related to Shared Memory CUDA Programming and Performance	5	5174	November 9, 2009
Execution Of Thread-Blocks CUDA Programming and Performance	4	5296	June 18, 2007
Shared memory per block Related to shared memory of an MCPU CUDA Programming and Performance	3	4005	August 14, 2007
NEWBIE:max size of shared memory of a block? CUDA Programming and Performance	3	3123	September 5, 2009
Shared memory is lifetime of block? CUDA Programming and Performance	6	3019	May 17, 2007
Not enough shared mem CUDA Programming and Performance	5	5805	November 3, 2009
Shared Memory and number of Blocks invoked CUDA Programming and Performance	4	5749	March 5, 2008
Scope of shared memory in CUDA CUDA Programming and Performance	12	3922	November 27, 2015
shared memory issue CUDA Programming and Performance	2	2983	February 17, 2010
So how much shared mem do we really have ? knowing cuda hw better = better optimization CUDA Programming and Performance	0	1922	November 20, 2009

Shared memory issues Initialization of shared memory

Related topics