Cuda + DMA + DAQ

ThiagoK · November 13, 2008, 11:23am

I need to process a very high mass of data in real time.
At first i will acquire the data through a DAQ Card that have four channels working at 20MSamples each.
The card can pass the data through DMA utilizing almost the entire PCI Bus Bandwith.

I just discovered CUDA and before I start to make any tests, I just wanted to know if anyone already tried to utilize CUDA to make real time processing
utilizing data comming from a high speed data aquisition card.

Does anyone even think that my idea is possible to be accomplished?

hill_matthew · November 13, 2008, 6:18pm

What do you mean by real-time? Do you mean that you can process the input without being overloaded, or that you can do that and also process samples in/out without latency? If your scenario can tollerate buffering up data and processing it in large blocks (perhaps 1k-1M samples at a time) I would imagine it’s worth investigating the Cuda option, it’s a fast processing system but as far as I am aware, the memory transfers can slow it all down if you are not able to process stuff in large-ish blocks.

nervestaple · November 14, 2008, 10:44pm

I would also be greatly interested in this: performing GPU FFTs on data gathered by another PCI express card, an FPGA analog-to-digital converter. The data transfer to the GPU would need to be done at high speed (but the FFTs themselves can be done in large blocks as long as the data to transform is already in GPU memory). I was toying with the idea of programming the FPGA to use some kind of SLI-like protocol, having it ‘pretend’ to be another video card. I am not familiar with the particulars though, so don’t know if this would work.

Generally, does anyone know of any way to transfer data quickly over the pci-express bus without the CPU and main memory involved?

tmurray · November 14, 2008, 11:22pm

We’re investigating how feasible this is.

hill_matthew · November 15, 2008, 12:43am

We could really make use of this kind of thing at work, so when you’ve got some thoughts (probably tmurray or anybody else that tries) I’d be really interested in your feasibility study output :) If it only turns out to be practical using a full Tesla device it would still be useful for this kind of application where cost is often less of an issue.

tmurray · November 15, 2008, 1:00am

One thing that would help us are specific devices you guys would want to use this with. If you can provide us with a list of specific cards you’re interested in, that would help us a lot.

ThiagoK · November 17, 2008, 11:04am

For example i would like to use this family of cards:

[url=“http://www.spectrum-instrumentation.com/m2i2020-exp.html”]http://www.spectrum-instrumentation.com/m2i2020-exp.html[/url]
[url=“PCI-5152”]PCI-5152
[url=“http://www.adlinktech.com/PD/web/PD_detail.php?cKind=&pid=358&seq=&id=&sid=”]404 Page Not Found | ADLINK

I have an application very similar to nervestaple’s.
I would like to transfer a block of information from this a/d card to the GPU without using the CPU or the main memory.
Then a can process the data, my only constrain is that a need to process the data at the same time the next data block is being acquired.

I’m working in a visualization application. It’s a kind of oscilloscope.

nervestaple · November 17, 2008, 4:11pm

In our case, it’s a home-grown pci-express ADC card based around a Lattice ECP2M FPGA. We need extremely fast samples (and relatively low cost) which is why we weren’t able to use any of the usual commercial options - and why NVIDIA GPUs are attractive to us.

hill_matthew · November 17, 2008, 5:13pm

Ditto, sorta, the thing I have in mind would be to eventually replace or compliment a bespoke hardware solution, so if we are guided down a route with specific capture cards nvidia recommend then it may not be an issue at this stage. These are early research ideas, do it before the competition does, etc.

alex_dubinsky · November 18, 2008, 8:32pm

Doing direct DMA may seem elegant, but I’m not sure if it’s really worthwhile. You can copy into memory and then back to the GPU at very high bandwidth (much higher bandwidth than almost any input card can sustain), and do it in parallel. It does a bit of latency, yes, but not much compared to the latency a CUDA solution normally entails.

Doing direct card-to-card is impeded by concerns for security. But it shouldn’t be too hard for the runtime to spit out the bus address of a cudaMalloc’d buffer. Everything else would have to be done by the capture card itself. (NVIDIA, don’t worry about opening up programmability of the GPU’s DMA engine or anything else like that.)

rbulha · June 17, 2013, 1:40pm

Any news on this issue? I’m interested in the possibility of data transfer between a capture card and GPU directly.

njuffa · June 17, 2013, 3:26pm

Have you had a chance to check out GPUDirect? [url]https://developer.nvidia.com/gpudirect[/url]

gfoersler · October 27, 2017, 5:46am

Hi,

Is there any chance that GPUDirect might be available for affordable cards?
Like under $1000 USD?

njuffa · October 27, 2017, 8:00am

What kind of cards? Which vendors have you contacted?

I would think there aren’t all that many use cases for which going through system memory is a major performance obstacle. Off the top of my head, I am only aware of Infiniband adapters and video frame grabbers with support for NVIDIA GPUDirect.

gfoersler · October 27, 2017, 12:57pm

With “affordable cards” I meant GPU cards.

If it only works with Tesla or Quadro cards, there is absolutely no sense in building a “home-grown pci-express ADC card based around a Lattice ECP2M FPGA” or the like, that nervestaple and Hill_Matthew are talking about.

njuffa · October 27, 2017, 3:35pm

A bigger problem with a “home-grown pci-express ADC card based around a Lattice ECP2M FPGA” might be that you will have to provide a GPUDirect enabled Linux driver for it.

There are certainly Quadro cards under $1000, but whether any of them support GPUDirect I do not know. I would suggest you inquire with NVIDIA. FWIW, I think it is unrealistic to expect cheap consumer products to provide all the benefits of professional solutions, in particular when the consumer products already come with support for tons of goodies included.

GPUDirect’s main benefit is in eliminating a system memory to system memory copy, which reduces latency and power consumption. With the advent of multi-channel DDR4 memory subsystems, I would expect both advantages to have diminished somewhat. I would suggest measuring current end-to-end latencies for your use case. From that determine how much of an issue that is, before deciding that you definitely need GPUDirect.

Topic		Replies	Views
Custom PCI-Ex FPGA board - DMA - Cuda CUDA Programming and Performance	4	1816	October 12, 2011
Kudos to Cuda and nvidia CUDA Programming and Performance	5	11878	June 11, 2007
GPU Communication Protocol CUDA Programming and Performance	16	6289	May 17, 2010
Is CUDA right for me? (FDTD) FDTD user needs fast computations while handling massive 3-D arrays CUDA Programming and Performance	17	22431	December 29, 2008
Real-time GPU processing Peer 2 peer data copy, Linux kernel memory, kernels in kernel, CUDA Programming and Performance	35	8135	June 30, 2010
advice needed by a PhD student CUDA Programming and Performance	26	2888	December 4, 2011
NVidia GPUs in Embedded Computing Has the GPU computing and CUDA penetrated the embedded market? CUDA Programming and Performance	11	3877	August 3, 2010
Cuda Question CUDA Programming and Performance	3	8025	October 6, 2008
CUDA Graphics Card suggestion Low end range PCI Express 1x CUDA Programming and Performance	9	6222	September 28, 2008
Circumventing the PCI-E BUS-- Finance Application in High Frequency Trading CUDA Programming and Performance	14	12184	November 10, 2021

Cuda + DMA + DAQ

Related topics