copy between two devices seeking instructions on direct copy between two devices

I am trying to figure out if (and how) I can make a direct copy from device memory of one GPU to device memory of another GPU (preferably asynchronous).
I am working with a 4-way Fermi board.