I want to know if we can use DMA for copying the data from CPU to GPU instead of memcpy. Does Cuda libraries support DMA ? How is the performance for both the methods ? Any help on how to use DMA on Jetson TK1 will be appreciated .
The memory isn’t separate between CPU and GPU IIRC, so I’m not sure that DMA is even necessary. Perhaps this article might help you: http://arrayfire.com/zero-copy-on-tegra-k1/
The article came from this thread: https://devtalk.nvidia.com/default/topic/781450/jetson-tk1-latency-too-high/