Using DMA for copying the data

I want to know if we can use DMA for copying the data from CPU to GPU instead of memcpy. Does Cuda libraries support DMA ? How is the performance for both the methods ? Any help on how to use DMA on Jetson TK1 will be appreciated .
The memory isn’t separate between CPU and GPU IIRC, so I’m not sure that DMA is even necessary. Perhaps this article might help you:

