I am developing a dual-GPU system, where a data set is processed first by GPU-1 and then by GPU-2.
I opened a buffer in the host to put the processed data by GPU-1, using Device-to-Host copy, and then transfer them to GPU-2 by Host-to-Device copy.
This seems not that efficient due to the PCI-E bandwidth limit. So is there some direct way to transfer memory from GPU to GPU? How about SLI? Does CUDA support SLI?
Oops… Thanks for correcting. GPUDirect has more relevance in a cluster envmt…where the infiniband adapter and GPU share a pinned buffer to reduce buffer copies for GPU-GPU communication.