[closed]Speedup copy buffer to pinned memory


Someone knows the approach to speedup memcpy from buffer to pinned memory?

Here is my code:

  1. comes frame as uchar buffer

  2. memcpy buffer to pinned buffer (allocated with cudaMallocHost) <— slow(200ms per 25mb frame buffer)

  3. cudaMemcpy from pinned buffer to device <— fast

  4. cuda operations <— fast

Ubuntu 14, Tegra3, Cuda 6.5

Best regards, Viktor.

Answer is here: https://devtalk.nvidia.com/default/topic/947488/jetson-tk1/slow-copy-memcpy-26mb-190ms-it-is-normal-bandwidth-/?offset=7#4921965