how to use DMA in CuDa

hi, im a new cuda developer and am having problems with high transfer times that are ruining the speedups obtained from cuda execution.
Im interested in looking at DMA. CAn anyone help me with DMA, or facilitate any reference manuals?
Thank you very much

Look @ pinned memory option. Pinned mem can be allocated using “cudaMallocHost()” and use “cudaMemcpyAsync” to copy out (GPU will DMA the memory).