Memory from peripheral devices to GPU DMA directly to another device...

Tobi_W · July 1, 2009, 3:11pm

Hi everyone,

until now we used CUDA to evaluate if the performance of GPUs are good enough to replace an existing FPGA platform. The results are very satisfying and now we want to move on from synthetic tests to a real integration in the existing environment. The data, which needs to be processed, must be pushed from a specific PCI-E device (now simply called ‘board’) to a CUDA enabled GPU. There are mainly two ways to do that:

First one:

		 DMA		   DMA

board --------> RAM --------> GPU

Second one:

		 DMA

board --------> GPU

The first one should be possible without big trouble. The CUDA application allocates some memory with cudaMallocHost() and the pointer/address is used by a driver which initializes the DMA via programmed I/O. The pointer just needs to be transferred to the driver and mapped to a physical address. But the host RAM would be a bottleneck.

The second one is more complicated, but without the bottleneck of the RAM. The board has DRAM which must be mapped in the address space of the host. Then the address’ of the mapped DRAM must be used by cudaMallocHost(), respectively by cudaMemcpyAsync(). But is this possible, is there any way to accomplish that? Has anyone tried to do this (successful or not)?

I would be happy to get some hints or comments! Thanks in advance.

Tobi_W · July 2, 2009, 12:17pm

Sorry for pushing the thread, but has nobody tried to transfer data directly into the gpu from a self developed device? I cannot believe that we are the only ones who are trying to do that…

YDD · July 2, 2009, 1:09pm

Well, since NVIDIA itself has yet to enable direct Tesla to Quadro transfers without using host memory, I doubt that anyone else has succeeded. It’s obviously something people want, though…

MisterAnderson42 · July 2, 2009, 1:28pm

People have been requesting this DMA directly from a device since the days before CUDA 0.8. NVIDIA has always said that they are thinking about it. It is also in the FAQ: [url=“http://forums.nvidia.com/index.php?showtopic=84440”]http://forums.nvidia.com/index.php?showtopic=84440[/url]

Sorry to make this post sound like a “just read the FAQ or google it” one, but it’s hard not to when it is on a topic that has been discussed many times. [url=“site:forums.nvidia.com DMA device CUDA - Google Search”]site:foru...lient=firefox-a[/url] - Google Search

Maybe, just maybe NVIDIA is finally actually putting their ideas to code and Tim will jump in this thread and drop one of his subtle hints about upcoming features… but I wouldn’t hold your breath. CUDA 2.2 completely revamped the way pinned memory is handled in the driver and it didn’t even bring this feature.

Tobi_W · July 2, 2009, 3:48pm

Thanks for your replies.

I used the forum search function and did not find interesting topics. Google is much better, so sorry for opening a new (probably) worthless thread. But maybe someone who implemented the first possibility (with 2 DMA transfers) can say somethink about the achieved bandwidth/latency…

Thanks again.

tmurray · July 2, 2009, 4:49pm

nope, nothing to announce/subtly hint. believe me, I’d like this kind of thing too.

DoTheDew · August 16, 2009, 5:12pm

Because you say that your board’s memory exists in the host address space, then your transfer should look like this:

[codebox]

DMA

RAM --------> GPU

[/codebox]

Right? Or am I missing something?

And if I’m right, the RAM on your board must be dual ported, correct?

Topic		Replies	Views
DMA for CUDA Transfering data to CUDA device mem passing by CPU CUDA Programming and Performance	2	7522	December 20, 2009
DMA Transfer between Third party device, host, and GPU CUDA Programming and Performance hw , cuda	0	672	June 8, 2020
DMA'ing into GPU card from another device CUDA Programming and Performance	1	1021	April 20, 2009
Multiple GPU's and sharing memory Will a CUDA API eventually be provided for this? CUDA Programming and Performance	4	16499	June 28, 2010
Using dma memory transfers CUDA Programming and Performance	2	8123	February 23, 2007
NVidia GPUs in Embedded Computing Has the GPU computing and CUDA penetrated the embedded market? CUDA Programming and Performance	11	3866	August 3, 2010
DMA between GPU and other peripherals CUDA Programming and Performance	2	5360	July 23, 2008
Question about multi-GPU programming Memory accesses and sharing CUDA Programming and Performance	10	7206	January 13, 2009
Real-time GPU processing Peer 2 peer data copy, Linux kernel memory, kernels in kernel, CUDA Programming and Performance	35	8109	June 30, 2010
Mapping PCIe memory in user-space Mapping video memory in user-space to avoid DMA transfers CUDA Programming and Performance	3	16313	December 14, 2009

Memory from peripheral devices to GPU DMA directly to another device...

Related topics