How can I transfer data between two GPU memory?

Hi!

I have a question when I do Multi-GPU programming.

I have Malloc a variable for each GPU,but I need to sum

the value of the variables and give the sum to the variable at each GPU

.How can solve the problem efficiently?

Any reply is appreciated.

The code is like these:

/////

for(i=0;i<GPU_N;i++)

{

cudaSetDevice(i);

cudaMalloc(&d_val[i],size[i]);

}

for(i=0;i<GPU_N;i++)

{

cudaSetDevice(i);

cublasSetStream(handles[i],streams[i]);

cublasSgemm(handles[i],…,d_val[i]); ////get the value of d_val[i]

}

////SUM thevalue

cudaDeviceSynchrn();

cudaSetDevice(0);

float *h_sum= new float;

for(i=0;i<GPU_N;i++)

{

cudaSetDevice(i);

cublasGetVector(…,h_val[i],d_val[i]);

}

hostsum(h_sum,h_val);///do the sum;

for(i=0;i<GPU_N;i++)

{

cudaSetDevice(i);

cublasSetVector(…,h_sum,d_val[i]);

}

now I want to sum the value of d_val[i] at each GPU memory.

Has some more efficient method than I do?