clever way to memcpy a patition of a matrix

What I am hoping/looking for is a strided version of cudaMemcpy() which would let me copy a submatrix of the host into a gpu matrix (of smaller size of course).
I can’t believe the multiple copies can be very efficient.
Thanks in advance

You would want to look into cudaMemcpy2D().