Can we send data from host to device while keeping device running the kernels?
You can on hardware with compute capability 1.1. See Section 22.214.171.124 (Asynchronous Concurrent Execution) in the guide for more explanation, and Appendix A for a list of compute capability 1.1 devices.
(Edit: Of course, that’s 1.1 or greater. The new 1.3 devices also allow concurrent memcpy and execution.)