Hello,
For my problem it would be convenient to run a kernel on some data, get some results and then return control to the CPU to allocate some memory on the GPU with the amount being dependent on the results. I would then like to be able to return control to the GPU with the contents of the cache (shared memory if you will) being intact and undisturbed.
Moreover, the size of the shared memory and texture caches is so small that unless your kernel is extremely short, the time required to refill the caches at kernel startup is probably negligible.