Yes, the data deposited into global memory by a previous kernel remains there for processing by a subsequent kernel. Obviously, both kernels should execute in the same stream, otherwise you may have a race condition unless additional synchronization primitives are added.
Much practical CUDA processing looks like this: source data is downloaded from the host (CPU) to the device (GPU), a sequence of CUDA kernels is used to transform the data, with all intermediate data remaining on the GPU, finally the results are uploaded back to the host. You might want to think of it as a processing pipeline consisting of CUDA kernels.