Shared Memory Persistence Conditions.

Is there any way to ensure that a shared array persists between calls of the same kernel? If so, in witch conditions it is guaranteed?

In my program, i was always saving a shared array (512 bytes) into global memory before the block finishes, and i read it back when the next block start. Today, i have removed this step and the algorithm keeps running correctly, proving that the shared array did persist between kernel lauches (persistence happens only between blocks with same Id, of course). I got 3% speedup.

My GPU has 4 MP’s and I launch 8 blocks in each kernel invocations. I just wanna know if there is any way to garantee that the shared memory persist (ex.: Block Count <= 2 x MP’s).

The Programming Guide states that “The global, constant, and texture memory spaces are persistent across kernel launches by the same application.”. Nothing is told about shared memory.
The topic [url=“http://forums.nvidia.com/index.php?showtopic=90245&pid=509815&mode=threaded&start=#entry509815”]http://forums.nvidia.com/index.php?showtop...rt=#entry509815[/url] says that “shared memory only has the lifetime of a single block”.

Any help is appreciated.
Thanks in advance.

I would never depend on this behavior. On the host, if you free() and then malloc(), it’s conceivable you could get the same block of memory with the same data, but it is unspecified and may change at any time for any reason, or for no reason. I would treat shared memory the same way.

Using uninitialized shared memory will certainly fail if you run more blocks than will simultaneously fit on the MPs. This means if in the future someone runs your algorithm on a device with fewer MPs than you designed for, your algorithm will definitely break.

Contents of shared memory are undefined at block initialization time. Don’t even ask “well what if I do _______”–they’re undefined, no matter what you do or hacks you think you have :)

Thanks Very Much

I will keep with the store/load step on global memory.