[Solved]Openmp+multigpu avoid variable duplication

Hi,

I have a problem while using openmp with gpus.
Lets consider the following code (just a template, you can find a working example there
https://mbarbry.pagekite.me/owncloud/s/CPTwhRP3gDEOWx7)

cudaMalloc(A)
cudaMemcpy(A_d, A, hostToDevice)
#pragma omp shared(A_d, nf) private(B, C, f)
for (f=0;f<nf;f++)
{
    perform_operation_gpu(A_d, B, C, f);
}

This code does not work, to make it work, I need to allocate and copy A inside the Openmp clause. But the problem is that it is totally inefficient from a memory point of view since A is a constant variable. How the GPUs could shared the variable A_d?

What I want is allocating and copy A on the GPU only with the master thread. But for the moment I only succeed to get my code working when every CPU thread allocate and copy the matrix A on the GPU, duplicating the data on the GPU.

Thank you for your help

can you provide the code that you try to compile instead of placing it to comments? i also suggest to use http://pastie.org/ or https://gist.github.com/ (although, well, you have also Makefile here…)

Thank you for your answer.

Here is the non working version.
http://pastie.org/10850299

thanks. with only one change “const int N = 3;” this successfully compiles here on windows (msvc2013+openmp2.5+cuda7.5). the error is here:

checkCudaErrors(cudaFree(A_d));
  checkCudaErrors(cudaFree(B_d));
  checkCudaErrors(cudaFree(C_d));

by moving cudaFree(A_d) outside of omp parallel block, i successfully run run your code:

M:\x>compile-cuda.cmd -Xcompiler /openmp pastie-10850299.cu
pastie-10850299.cu
pastie-10850299.cu(98): warning: integer conversion resulted in a change of sign

pastie-10850299.cu(98): warning: integer conversion resulted in a change of sign

   Creating library a.lib and object a.exp

M:\x>a.exe
Print A:
mat[0, 0] = 0
mat[0, 1] = 2
mat[0, 2] = 4
mat[1, 0] = 1
mat[1, 1] = 3
mat[1, 2] = 5
mat[2, 0] = 2
mat[2, 1] = 4
mat[2, 2] = 6
SUM_REF = 333
CPU thread 1 uses CUDA device 0
CPU thread 4 uses CUDA device 0
CPU thread 0 uses CUDA device 0
CPU thread 6 uses CUDA device 0
CPU thread 5 uses CUDA device 0
CPU thread 7 uses CUDA device 0
CPU thread 3 uses CUDA device 0
CPU thread 2 uses CUDA device 0
SUM = 333
SUM - SUM_REF = 0

M:\x>

i placed the working code to https://gist.github.com/Bulat-Ziganshin/7768dbf610981155f53424c93de6b9c2

thank you for example of using CUDA with OpenMP!

Yes, it is working in the case of single GPU!
Thanks a lot. I will try now to make it works in the case of multiples GPUs.

I finally managed to write a multiple GPUs version.
If someone have some remarks or/and improvements I would I would be happy to read them.

http://pastie.org/10850654

Thanks for your help BulatZiganshin!

thank you for extended example