Slow memory copy from Device to Host with NvBufSurfaceMap API

Hi, i am trying to use NvBufSurfaceMap to transform deepstream surface buffer in Jetson platform (Xavier AGX) and realize that it has very poor performance (2 GiB/s) because mappedAddr.addr of mapped surface is pageable memory. Can anyone from NVIDIA confirm and have a solution for this.

If you allocate a CPU buffer and do copy by calling memcpy(), the performance can be capped by CPU. We would suggest create NvBufSurface so that you can call NvBufSurfTransform() to copy data to another buffer. It uses hardware VIC engine and is fast. You can call NvBufSurfaceMap() to get CPU-accessible pointer.