cudaMemcpy

I make that image convertine using CUDA
I have a question about cudaMemcpy

As I know, method using cuda needs data copy between host and device
By the way, this is my experience result
the time required data size
host->device copy 1sec 210kb
device->host copy 10sec 200kb

The Host to device copy data size is better than device to host. but, the time required is less than.

Do you know why are reason.

If you know the reason come to knowledge.
If you know the solution improving method device to host copy speed, come to knowledge too.

I wait you answer.
Thank you.^^