Can the host manipulate device memory when a kernel isn't running?

When a kernel is finished is it possible for the host to manipulate portions of data in device memory before running a kernel again? Performing cudaMemcpy to copy data to the host, manipulating it, then copying it back is quite slow when only approximately 1MB of data needs to be altered in an 800mb data set.

You can easily update the relevant 1 MB of device data from the host with cudamemcpy() or cudaMemset(), there is no need to copy the entire 800 MB data set back and forth.