default(none) directive behaviour as per OpenMP?

Hello,

I’m porting a OpenMP code to OpenACC. In the OpenMP version I have a parallel section with DEFAULT(NONE), and the compiler will not compile the code until I specify for each and every variable whether it should be shared or private to each thread. This is more work on my side, but for the initial porting I actually prefer it since I have to think about the role of each variable.

So I have three questions:

  1. Now, when porting to OpenACC I was hoping that !$acc parallel loop default(none) would just do the same as OpenMP, but I can compile without issues (pgi 18.7-0) even without specifying the role of any of the variables in the loop. Is that normal? Can I ask the compiler somehow to force me to specify the role of each variable?

  2. What are the defaults for a directives parallel, loop and parallel loop? If I understand correctly: if I have a parallel directive and I specify private(v1), v1 will be private to each gang, so all threads created within will actually share it; if I have then a loop directive inside that parallel region, and I specify a private(v2), this v2 will be private to each thread. What happens when I combine both so that I have !$acc parallel loop private(v_a)? I think v_a will be private to each thread, but just making sure.

  3. Is there a way to dump for each parallel region which variables are being shared, which are created, etc.?

Many thanks,
AdV

Hi AdV,

For #1, “default(none)” should give you something like the following error if the arrays aren’t defined. Though if the variables are in an outer data region within the same scope, the compiler will see these and apply it to the compute region. Also, “default(none)” would only apply to variables that don’t have a predefined data attribute (like arrays) but not apply to variables that are implicitly private (like scalars).

% pgfortran -ta=tesla test.F90
PGF90-S-0155-Data clause required with default(none): a(:,:) (test.F90: 25)
PGF90-S-0155-Data clause required with default(none): b(:,:) (test.F90: 25)
  0 inform,   0 warnings,   2 severes, 0 fatal for test
% pgcc -ta=tesla  test.c -c
PGC-S-0155-Data clause required with default(none): matrixProduct[start:end-start][:] (test.c: 49)
PGC-S-0155-Data clause required with default(none): matrixA[start:end-start][:] (test.c: 49)
PGC-S-0155-Data clause required with default(none): matrixB[:130][:] (test.c: 49)
PGC/x86-64 Linux 18.7-0: compilation completed with severe errors



What happens when I combine both so that I have !$acc parallel loop private(v_a)? I think v_a will be private to each thread, but just making sure.

Private applies to the loop so instead of thinking of it in terms of how the compiler implements private, think of it as each iteration of the loop will have a private copy “v_a”. Borrowing from C, it’s like declaring “v_a” within the loop itself:

#pragma acc parallel loop private(v_a)
for (int i=0, ....
{
...
}



#pragma acc parallel loop 
for (int i=0, ....
{
    int v_a;   // each iteration has their own "v_a"
...
}

Though to answer your question more directly, the variable will be private to lowest level schedule when a loop has a combined schedule. i.e. if the loop is scheduled as “gang, vector”, then the variable is private to the vector.


  1. Is there a way to dump for each parallel region which variables are being shared, which are created, etc.?

I’m not 100% clear what you’re asking here, but I think the answer is a partial no and partial yes.

For variables like arrays and module data that don’t have a predefined data attribute, if you leave off the “default(none)” and not have an explicit data region, the compiler will need to implicitly copy them. Hence in the compiler feedback messages, you’ll see which arrays are being implicitly copied.

For variables that do have predefined data attribute, like most scalars, the compiler will not print any information. I’m not really convinced that having the compiler print this information would useful since it could be quite a lot and the user has very little control over it. Though, I very open to arguments as to why it could be useful.

Hope this helps,
Mat

Hi Mat,

thanks for the reply.

Regarding the last point, you say:

For variables that do have predefined data attribute, like most scalars, the compiler will not print any information. I’m not really convinced that having the compiler print this information would useful since it could be quite a lot and the user has very little control over it. Though, I very open to arguments as to why it could be useful.

I guess I was asking this because of the issues I’m having with porting a particular code. For a given loop, I wrote !$acc parallel loop default(none), assuming that all the scalars would be private to the loop, but this was giving me incorrect results, so I had to explicitly write !$acc parallel loop default(none) private(ip,weight,…), for all the variables that need to be loop-private and then the results were OK.

So, I assume that if I don’t specify the private clause, the scalars will be private, but to the gang, not to the loop?

Then, if that is so, in order to make sure I get correct results I have to include all the scalar variables that I need to be private to each loop in the private clause. But since the compiler will not force me (even after setting default(none)) to specify the role of the scalars, then I can forget some of the required loop variables, and this can lead obviously to incorrect results.

So, if the compiler could dump a list of all the variables (scalars included), where it could give information on whether each variable was going to be private to the gang or only to the loop, then I could use that information to debug the code.

I hope my reasoning is clear, though perhaps I’m misunderstanding something.

Any help/pointers appreciated,
AdV

wrote !$acc parallel loop default(none), assuming that all the scalars would be private to the loop, but this was giving me incorrect results, so I had to explicitly write !$acc parallel loop default(none) private(ip,weight,…), for all the variables that need to be loop-private and then the results were OK.

By default, scalars are firstprivate (or private for the kernels construct). However there are few exceptions.

  1. In Fortran, scalars declared as module data, or in C/C++ global scalars.
  2. When a scalar is passed by reference (default in Fortran) to a device subroutine.

In both cases, it’s possible that there are other references to the scalar so the compiler must assume that other references exist. In which case, it’s not safe to automatically privatize the scalar.

I’d need to see the code for this case to determine exactly why you needed to explicitly declare these scalars as private.

So, I assume that if I don’t specify the private clause, the scalars will be private, but to the gang, not to the loop?

Yes, you can assume scalars are private with the exceptions noted above. Though when implicitly private, the compiler will determine which level to make the scalar private based on the schedule applied. So it could be gang-private, but also vector-private, depending on the schedule.

But since the compiler will not force me (even after setting default(none)) to specify the role of the scalars, then I can forget some of the required loop variables, and this can lead obviously to incorrect results.

You might try using “kernels” instead of “parallel”. With “kernels”, the compiler performs an extra analysis step so would not parallelize a loop unless there are no dependencies. For the exceptions noted above, kernels will not parallelize the loop instead giving messages about loop dependencies or “live-out” issues.

With “parallel”, you are asserting to the compiler that there are no dependencies and it should go ahead and parallelize the loop.

So, if the compiler could dump a list of all the variables (scalars included), where it could give information on whether each variable was going to be private to the gang or only to the loop, then I could use that information to debug the code.

Again, I think using “kernels” is your best bet here. The compiler should tell you if any of the scalars need to be explicitly privatized.

-Mat