I am developing a dual-GPU system, where a data set is processed first by GPU-1 and then by GPU-2.
I opened a buffer in the host to put the processed data by GPU-1, using Device-to-Host copy, and then transfer them to GPU-2 by Host-to-Device copy.
This seems not that efficient due to the PCI-E bandwidth limit. So is there some direct way to transfer memory from GPU to GPU? How about SLI? Does CUDA support SLI?