Send data and calculate at the same time?

Can we send data from host to device while keeping device running the kernels?

You can on hardware with compute capability 1.1. See Section 4.5.1.5 (Asynchronous Concurrent Execution) in the guide for more explanation, and Appendix A for a list of compute capability 1.1 devices.

(Edit: Of course, that’s 1.1 or greater. The new 1.3 devices also allow concurrent memcpy and execution.)