At least not officially. You might be lucky if you join at most as many blocks as can run concurrently in one wave, that the contents are still there on the next invocation of the same kernel. There are no guarantees however (not even that the same block ends up in the same shared memory area on the same SM).
I haven’t heard of any experiments in this direction either (although they would be simple to carry out). I’m pretty sure experiments have a higher chance of success on compute capability 1.x devices than on Fermi.
The fact that nobody reports about this might indicate it’s just not worth doing. The real answer to your problem probably is to do more work per block to amortize the fetching from global memory.