Someone knows the approach to speedup memcpy from buffer to pinned memory?
Here is my code:
comes frame as uchar buffer
memcpy buffer to pinned buffer (allocated with cudaMallocHost) <— slow(200ms per 25mb frame buffer)
cudaMemcpy from pinned buffer to device <— fast
cuda operations <— fast
Ubuntu 14, Tegra3, Cuda 6.5
Best regards, Viktor.