Custom PCI-Ex FPGA board - DMA - Cuda


I need to verify the feasibility of a system using a custom pci-express board with a FPGA onboard + some memory banks on it doing some frame grabbing. Each frame must be transferred to the video card to be used by cuda kernels. Each frame is really high resolution and is acquired at a couple hundreds of frames per second. I would need a really high throughput to achieve what I must do.

I was wondering if there had been some progress made on pci-express device to device memcpy made in newest release of cuda? The system runs on windows 7 and custom drivers are already implied to communicate with the FPGA board. I saw that GPUDirect exists but I’m not sure that it is what I’m looking for… Basically I’d like the data to flow in real time from the FPGA board memory banks to the GPU memory using DMA without using a CPU application.

Anyone has ideas about that? Is it something feasible?

Thanks a lot!

You might find the APENet+ project of interest, which is to design a custom FPGA network card which has DMA capability with CUDA GPUs:

Project page

Some papers describing this work

Interesting project. I’ll take a look at the information available.

Other than that, I saw that last week GPUDirect for Video I/O was announced and it may be what I’m looking. If I can find more documentation about it maybe I’ll be all set :)


I am also interested in knowing more about this, have you found more documentation about it?

I looked at the GPU for Video thing. It can only be used with Tesla video cards so I was a bit screwed since my system is working with GTX cards… Also, the DMA isn’t done directly from the FPGA side to the gpu memory. That’s all I know for now…