Is it possible to access CUDA shared memory directly from the host?
Not possible. It is only accessible from CUDA device code.
And not reliable to count on the content of shared memory from kernel to kernel, right?
correct. Shared memory is not even visible from one threadblock to the next. Each threadblock in a kernel has its own logical shared memory array.
If you want to extend the lifetime of shared memory, use fewer blocks and do more work in each of them. But be aware of occupancy - fewer blocks might not fill the entire GPU anymore.