Why CudaMemcpyH2D cost a lot of time?