Hiï¼
I have a question when I do Multi-GPU programming.
I have Malloc a variable for each GPU,but I need to sum
the value of the variables and give the sum to the variable at each GPU
.How can solve the problem efficiently?
Any reply is appreciated.
The code is like these:
/////
for(i=0;i<GPU_N;i++)
{
cudaSetDevice(i);
cudaMalloc(&d_val[i],size[i]);
}
for(i=0;i<GPU_N;i++)
{
cudaSetDevice(i);
cublasSetStream(handles[i],streams[i]);
cublasSgemm(handles[i],…,d_val[i]); ////get the value of d_val[i]
}
////SUM thevalue
cudaDeviceSynchrn();
cudaSetDevice(0);
float *h_sum= new float;
for(i=0;i<GPU_N;i++)
{
cudaSetDevice(i);
cublasGetVector(…,h_val[i],d_val[i]);
}
hostsum(h_sum,h_val);///do the sum;
for(i=0;i<GPU_N;i++)
{
cudaSetDevice(i);
cublasSetVector(…,h_sum,d_val[i]);
}
now I want to sum the value of d_val[i] at each GPU memory.
Has some more efficient method than I do?