DUAL GPU VIDEOBOARD INTERNAL COPY OPTIMIZATION Which is the best way of copy in DEVICE data between

Which is the best way of copy in DEVICE ( WHITOUT TRANSFER TO HOST )data between 2 CUDA contexts in dual GPU videoborads like NV295 ? Thank you very much!!!