copy cudaArray from one device to another

K_Bondar · July 25, 2013, 2:43pm

Hi!
got the following structure of programm:

copying some rendering data (cudaArray*) from GPU 1 to GPU 2 by cudaMemcpyArrayToArray
parralell executing of kernels on both gpu’s
copying output from 2nd to 1st

and the problem is that cudaMemcpyArrayToArray is extremely slow (data is about 50mb only).
why is it so slow and is there a way to copy cudaArray from one device to another without cudaMemcpyArrayToArray ?
Thanks!