Shared Memory Persistence Conditions.

Edans_Sandes · May 2, 2009, 1:19am

Is there any way to ensure that a shared array persists between calls of the same kernel? If so, in witch conditions it is guaranteed?

In my program, i was always saving a shared array (512 bytes) into global memory before the block finishes, and i read it back when the next block start. Today, i have removed this step and the algorithm keeps running correctly, proving that the shared array did persist between kernel lauches (persistence happens only between blocks with same Id, of course). I got 3% speedup.

My GPU has 4 MP’s and I launch 8 blocks in each kernel invocations. I just wanna know if there is any way to garantee that the shared memory persist (ex.: Block Count <= 2 x MP’s).

The Programming Guide states that “The global, constant, and texture memory spaces are persistent across kernel launches by the same application.”. Nothing is told about shared memory.
The topic [url=“http://forums.nvidia.com/index.php?showtopic=90245&pid=509815&mode=threaded&start=#entry509815”]http://forums.nvidia.com/index.php?showtop...rt=#entry509815[/url] says that “shared memory only has the lifetime of a single block”.

Any help is appreciated.
Thanks in advance.

Jamie_K · May 2, 2009, 6:59am

I would never depend on this behavior. On the host, if you free() and then malloc(), it’s conceivable you could get the same block of memory with the same data, but it is unspecified and may change at any time for any reason, or for no reason. I would treat shared memory the same way.

Using uninitialized shared memory will certainly fail if you run more blocks than will simultaneously fit on the MPs. This means if in the future someone runs your algorithm on a device with fewer MPs than you designed for, your algorithm will definitely break.

tmurray · May 2, 2009, 7:50am

Contents of shared memory are undefined at block initialization time. Don’t even ask “well what if I do _______”–they’re undefined, no matter what you do or hacks you think you have :)

Edans_Sandes · May 2, 2009, 3:04pm

Thanks Very Much

I will keep with the store/load step on global memory.

Topic		Replies	Views
CUDA: Using shared memory between different kernels.. CUDA Programming and Performance	4	16315	July 21, 2017
Shared Memory variables ? In multiple kernel invocations CUDA Programming and Performance	2	1884	July 11, 2008
__shared__ and __device__ memory specifier clarification CUDA Programming and Performance	1	648	April 11, 2013
Expanding shared memory into global memory? CUDA Programming and Performance	3	1544	August 3, 2009
Shared Memory Help needed CUDA Programming and Performance	1	697	March 25, 2011
Shared memory : shared access CUDA Programming and Performance	4	2028	July 21, 2008
Shared memory doubt CUDA Programming and Performance	5	4608	June 11, 2008
Shared memory between several kernels CUDA Programming and Performance	6	1791	April 6, 2010
mapping blocks to GPU SM's? CUDA Programming and Performance	5	12825	April 28, 2010
Confirm that dynamically allocated __shared__ memory is just as fast as the statically allocated variety? CUDA Programming and Performance	0	285	July 15, 2022

Shared Memory Persistence Conditions.

Related topics