Slow memory copy from Device to Host with NvBufSurfaceMap API

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
• DeepStream Version
• JetPack Version (valid for Jetson only)
• TensorRT Version
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs)
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Hi, i am trying to use NvBufSurfaceMap to transform deepstream surface buffer in Jetson platform (Xavier AGX) and realize that it has very poor performance (2 GiB/s) because mappedAddr.addr of mapped surface is pageable memory. Can anyone from NVIDIA confirm and have a solution for this.

If you allocate a CPU buffer and do copy by calling memcpy(), the performance can be capped by CPU. We would suggest create NvBufSurface so that you can call NvBufSurfTransform() to copy data to another buffer. It uses hardware VIC engine and is fast. You can call NvBufSurfaceMap() to get CPU-accessible pointer.