I have a real-time system, for which I am considering a GPU solution. It consists of a camera frame grabber, a Tesla C1060, and an IO output board. Ideally, there would be no CPU intervention except to start or stop operations.
Operation is envisioned to be as follows (detailed interlocking ignored here except for the GPU):
The frame grabber DMAs a new frame of data into the GPU’s input buffer in global memory. In addition to the camera data, the frame contains a control word, which the GPU will decode.
The GPU will loop in its kernel, periodically looking at the control word in global memory.
When it sees the word, it will do whatever is appropriate: exit if it is “abort”; process the data if it is “buffer ready”; etc.
When it has decoded the control word, it will zero it and start processing or aborting.
When the GPU has finished processing, it will place the processed data into its output buffer, along with a control word: such as “output ready” or “error”, etc.
The GPU will now start looping, checking its input control word for its next action.
The output card will be polling the output control word of the GPU’s memory and when it sees there is data, it will start sending it.
Ideally, the GPU could DMA straight into the output card’s buffer. Can it do that?
Does anyone see any problem with the video card DMA’ing into the Tesla’s global memory?
I’m really trying to keep the CPU out of the mix if possible except for setup and take down of the process. It should be able to run for hours with no intervention.
If the CPU is out – who will do the job of NVIDIA’s driver?
Some1 needs to know how to talk to the card (program registers, initialize it, download kernels on to it, handle interrupts, errors, manage its memory etc…)
A lot of people have asked for this. It is technically possible for other PCI-E devices to DMA directly into GPU memory, but we don’t have a solution yet. We’ll keep you posted.
If the memory is pinned and the physical address is known, what is the limiting factor that keeps another device other than the CPU from DMA’ing to that memory? It would seem a driver could do this.
The device memory space is NOT exposed in the PCI Mem space.
TESLA with 4B of RAM on a 32-bit system still exists without eating up system address space. (If in windows, check the device manager, device properties)…
What they probably do is “bank” out the memory… The driver programs the bank to differnt portions of memory and then copies data onto it…
So, with this setup, it is NOT possible to DMA directly onto the device though the device is PCI compliant…