Openacc ,values in shared variables were not changed

Hello,

There is one question when I used present() and #pragma acc data clause in loops.

Please take a look at the code below for your reference:

#include
#include<openacc.h>
#define N 5

using namespace std;

int main(){
int a[N];
int i,j=0;
#pragma acc data copyin(a[:N],i,j) //can not use present(i,j),device can not find any data
{

    #pragma kernels present(a[:N],j,i)
{	
for(int i=0;i<N;i++){
           for(int j=0;j<N;j++){
             a[i]=i+j;         
                    
	}
   }
 }
   #pragma acc data copyout(a[:N],j,i)

}
cout<<“a[N]”<<a[N-1]<<endl;
cout<<i<<endl;
cout<<j<<endl;
}

At the end I got the result:

a[N]8
0
0

I just think if I missed something there .From the result the host finally got the a[N-1] from device,but host did not receive any value of i,j from device. Regarding to variables scope, for loops could share a ,and j ,I think. Maybe my thought could be incorrect. I don’know how to show the value of i and j after calculation in loops.

I hope someone could provide any hint.

Hello,

I notice a few things that can lead to these results.

The present directive is not required on the kernels, the matrix A has just been copied into the device memory.

The main problem I see here is that i and j are not shared. Each thread will have its own internal copy i and j (otherwise you will have a race condition on i and j). This gives you also problems in the computation of your loop.

Hello Keroro,

Thanks for your suggestion.May I know how I can set variable as shared variable?Do I need to use present()?

Futhrmore, I have the other question . When I changed the directive below, I got the result:
#include
#include<openacc.h>
#define N 5

using namespace std;

int main(){

    int a[N];
         int i,j=0;

#pragma acc data copyin(N,a[:N],i,j) //present(i,j)

{

    #pragma kernels present(i,j,a[:N]) //copyout(a[:N],i,j)
    {

    for(;i<N;i++){
                  for(;j<N;j++){
             a[i]=i+j;

            }


      }
    #pragma acc data copyout(a[:N])
    }

#pragma acc data copyout(j,i)
}
cout<<“a[N]”<<a[N-1]<<endl;
cout<<i<<endl;
cout<<j<<endl;

}

a[N]32513
5
5
I don’t understand I get the a[N-1] as a random number.

Hi glaciya2018,

In your first post, as I’ve noted in my previous answers, the i and j declared in main are different from the i and j’s declared in the for loops. Though it appears that you may have figured that out given you remove the inner loop declarations in the second post.

However, there’s still several issues with your second post.

First, there’s no “#pragma kernels”. This should be “#pragma acc kernels”.

Second, the data region is using a “copyin” so the variables aren’t being copied back from the device. The second data region is nested within the first so is extraneous. Besides having no scope, the variables are present on the device already, and given OpenACC “present_or” semantics, no copy will be made. To synchronize memory within a data region, you’ll want to use the “update” directive.

#pragma acc update self(a[:N],j,i)

Though it’s simpler in this case to simply change “copyin” to “copy” so the data is copied back at the end of the data region.

Last, why are you globalizing the loop index variables? As I’ve noted before, this will prevent the code from being parallelized. In order to parallelize the code, the loop index variables must be private, else it causes a race condition.

Here’s the corrected code. I left in the global index variables, but as you can see from the compiler feedback messages, the code will get run sequentially on the device.

% cat test.cpp
  #include<iostream>
  #include<openacc.h>
  #define N 5
  using namespace std;
  int main(){

int a[N];
int i,j=0;
#pragma acc data copy(a[:N],i,j) //can not use present(i,j),device can not find any data
{
    #pragma acc kernels present(a[:N],j,i)
    {
        for(i=0;i<N;i++){
           for(j=0;j<N;j++){
             a[i]=i+j;
           }
       }
    }
}
cout<<"a[N]"<<a[N-1]<<endl;
cout<<i<<endl;
cout<<j<<endl;
}

% nvc++ -acc -Minfo=accel test.cpp; a.out
main:
     10, Generating copy(a[:],j,i) [if not already present]
     12, Generating present(j,i,a[:])
         Accelerator serial kernel generated
         Generating NVIDIA GPU code
         13, #pragma acc loop seq
         14, #pragma acc loop seq
     13, Loop carried scalar dependence for i at line 15,17
         Parallelization would require privatization of array a[i]
         Conditional loop will be executed in scalar mode
         Loop carried scalar dependence for i at line 17
         Parallelization would require privatization of array a[i]
         Conditional loop will be executed in scalar mode
     14, Parallelization would require privatization of array a[i]
     15, Accelerator restriction: induction variable live-out from loop: j
     16, Accelerator restriction: induction variable live-out from loop: j
a[N]8
5
5

Here’s a second example where I remove the global index variables, thus allowing the outer loop to be parallelized. While not needed, I also switched the code to use the “update” directive so you can see it in context.

% cat test.cpp
  #include<iostream>
  #include<openacc.h>
  #define N 5
  using namespace std;
  int main(){

int a[N];
#pragma acc data create(a[:N])
{
    #pragma acc update device(a[:N])
    #pragma acc kernels present(a[:N])
    {
        for(int i=0;i<N;i++){
           for(int j=0;j<N;j++){
             a[i]=i+j;
           }
       }
    }
    #pragma acc update self(a[:N])
}
cout<<"a[N]"<<a[N-1]<<endl;
}

% nvc++ -acc -Minfo=accel test.cpp ; a.out
main:
      9, Generating create(a[:]) [if not already present]
     12, Generating update device(a[:])
         Generating present(a[:])
     13, Loop is parallelizable
     14, Loop carried reuse of a prevents parallelization
         Inner sequential loop scheduled on accelerator
         Generating NVIDIA GPU code
         13, #pragma acc loop gang, vector(32) /* blockIdx.x threadIdx.x */
         14, #pragma acc loop seq
     14, Loop carried reuse of a prevents parallelization
     20, Generating update self(a[:])
a[N]8