I’ve read various posts about this topic and understand that the programmer must explicitly handle multiple GPU’s independently. I also understand that it is possible to use cudaMemcpy between devices and even to speed this up by using pinned memory shared between host threads (as of toolkit 2.2).
However, what I would like to do is transfer directly from the GPU memory of one card (of say an S1070) to another, without having to go through host memory.
Is this already possible and I am not finding the API to do so?
Or, might it be possible in the future with CUDA?
Tangential question, if anyone knows, does OpenCL provide the means of using multiple devices transparently?
AFAIK but not having testing it myself (due to lack of test-bed with 2 GPU with Pinned Mapped Memory capability), it seems possible to exchange data between GPU using Pinned Mapped Memory, that is allocated for all GPU at-once.
The drawback is that data will be written in main memory, and read from it, but it’s bandwidth is way higher than PCI-e 16x bus, so I don’t see it as a problem in itself.
Thank you for the reply, iAPAX. I was wondering peer-to-peer communication between 2 GPU boards, but this can be an alternative solution if 2 GPU boards can communicate through the host memory (via DMA). Thank you for the advice.
Sorry for the beginer’s question; if 2 boards are GPU boards, using CUDA API, we can allocate pinned mapped memory in host memory and 2 boards can share and DMA data through this memory (correct?). I assume it’s virtual memory space returned by CUDA that the GPU(s) actually can access to (correct?).
I’m also considering the possility of whether our custom FPGA PCe board can send data to the GPU board via DMA (want to avoid the host SW to do this for the performance reasons.) In the previous example of 2 GPU boards, I assume 2 GPU boards can share and read/write from/to the virtual memory space of the host pinned mapped memory, allocated by CUDA. Does CUDA have a way to return the physical memory or can we specify the physical memory’s base address, so the 3rd non-GPU board (ex. FPGA board) can share the same pinned mapped memory space?