How to force CUDA to use DMA for memcpy

grigorymakarevich · January 12, 2018, 7:06pm

Hi,

I am running samples/1_Utilities/bandwidthTest. I see great performance for DeviceToHost and HostToDevice operations.

However, if instead of RAM I am using MMIO space of some other device, the performance drops dramatically. I also observe, that instead of using DMA in this case, the CPU is used (which actually makes the performance so bad…) Is it possible to force cuda to still use DMA instead of CPU to do that copy?

Thanks!

Robert_Crovella · January 13, 2018, 2:31am

The basic OS driver model generally prevents PCI device A from writing directly to a buffer owned by PCI device B without doing some special things (in the drivers).

If you want to transfer data directly to/from a PCI device that is on the same PCI fabric as a GPU, then the defined method for that is GPUDirect RDMA:

• RDMA for GPUDirect Doc page ([url]GPUDirect RDMA :: CUDA Toolkit Documentation)
• GDRCopy github project ([url]https://github.com/NVIDIA/gdrcopy[/url])

This assumes you have access to the driver source code for your device and are a reasonably proficient driver writer for the OS in question.

Unless you’ve done that, CUDA cannot write directly to your device, but instead will write to system memory, and if that memory is not pinned, then maximum transfer speed cannot be achieved.

shaklee3 · January 15, 2018, 5:20am

txbob I noticed GDRCopy claims it can be faster than cudaMemcpy. Have you tried this yourself? I tried the sample benchmarks and it performed worse.

edit: sorry, I forgot I asked here, so I posted some results in another thread.

nunez.juan · November 27, 2018, 6:38pm

And which thread did you post to?

Topic		Replies	Views
Memory from peripheral devices to GPU DMA directly to another device... CUDA Programming and Performance	6	4264	August 16, 2009
Copy to CUDA GPU Memory from a PCI Device CUDA Programming and Performance	2	964	June 12, 2013
Writing from GPU memory to memory on PCI device CUDA Programming and Performance	1	837	February 24, 2011
DMA for CUDA Transfering data to CUDA device mem passing by CPU CUDA Programming and Performance	2	7577	December 20, 2009
Get error data from PCIe card by cudaMemcpy() CUDA Programming and Performance	1	621	June 22, 2017
DMA'ing into GPU card from another device CUDA Programming and Performance	1	1064	April 20, 2009
GPU<-->GPU DMA? CUDA Programming and Performance	1	2517	March 27, 2008
faster copying to gpu CUDA Programming and Performance	1	2440	January 31, 2008
From NIC to GPU. CUDA Programming and Performance	42	14050	August 21, 2025
RDMA using GPUDirect CUDA Programming and Performance	0	762	March 24, 2014

How to force CUDA to use DMA for memcpy

Related topics