Implicit behaviour of variables inside compute construct

I am using OpenACC, with pgfortran.
The OpenACC manual says that, implicitly

  • scalars are firstprivate
  • arrays are copy

This simple program gives output as if BOTH are copy.
Pl clarify.

PROGRAM main
integer i,j(2)
i = 10
j(1) = 20
j(2) = 30

!$acc parallel
i = -10
j(1) = -20
!$acc end parallel

write(*,*) 'i = ', i
write(*,*) 'j = ', j

END PROGRAM main

Incidentally, gfortran (GCC) compiled code gives expected output.

Thanks,
Arun

Hi Arun,

Correct, scalar variables in a parallel region are implicitly defined as first private. The exceptions being when used in a reduction, used in a data clause, a live-out variable, is a module variable (or a global reference), or passed by reference to a device routine.

Given “i” is assigned in the parallel loop and then used on the host after the loop, the compiler warns about a “Scalar last value needed after loop for i at line 12”. This forces “i” to be a shared rather than private variable. Otherwise, the code would get differing answers depending if you’re running with or without OpenACC. This is really a bug in your code since a live-out variable can prevent parallelization and can cause inconsistent behavior between the host and device. .

% nvfortran -fast -acc test4.F90 -Minfo=accel -V20.5 ; a.out
main:
      7, Scalar last value needed after loop for i at line 12
         Generating Tesla code
         Generating implicit copyout(j(1)) [if not already present]
         Scalar last value needed after loop for i at line 12

Now you can override the implicit behavior by explicitly adding a “firstprivate(i)” or “private(i)” clause on the parallel region, but you would get incorrect answers.

Incidentally, gfortran (GCC) compiled code gives expected output.

GNU’s analysis isn’t quite as comprehensive as ours so they probably miss the live-out variable so go ahead and implicitly treat “i” as first private. So while “i=10” is what you expect, it is wrong with respect to the answer you’d get running on the host. At the very least, they should be giving you a warning about the live-out variable so you have an opportunity to fix this bug in your code.

-Mat

Thank you Mat.
Your description explains the observed behavior.
I had 3 follow-up queries.

(1) Now, the fact that "compiler forces the live-out variable (i) to be shared (copy), instead of being private (default-firstprivate) " is mentioned in the OpenACC standard, OR is it a design-choice of the PGI compiler team?

(2) Further, you mention that “code will give differing answers with or without OpenACC”.
I am not sure, where this constraint is mentioned, or even why this is required.
Because, if you were to explicitly give private or firstprivate, code will anyway give different answers for

  • with OpenAcc,

  • without OpenAcc.

    Not sure, if i am missing something.

(3) Finally, how is this related to the ability to parallelize. Because as i understand, there is an implicit barrier at the !$acc end parallel.

Thanks,
arun

You can argue that it’s a design choice, or philosophy, based on the premise that programs should get consistent (correct) answers when run with or without OpenACC. Getting incorrect answers, no matter how fast, are of no use. Hence when the compiler does its analysis, it will error on the side of correctness. The choices it makes will be displayed in the compiler feedback messages (-Minfo=accel) and a user may override them by explicitly adding clauses (like firstprivate). However if the user is making this decision, they are doing so understanding the ramifications.

As for #3 as to why “live-out” would present parallelism, I’ll try to illustrate using the following code. In your example, the parallel region is executed in gang-redundant mode, meaning all gangs will execute the same code, the gangs are setting the same values to the same variables, so it doesn’t matter the order in which the values are set. But let’s consider the following:

!$acc parallel loop
    do k=1,n
       i = k
       j(k) = i
    end do

    print *, i
    print *, j

When run in parallel, the order in which the iterations of the loop are executed is non-deterministic. Hence the value of “i” will be which ever iteration was executed last. To be correct (or at least consistent with the non-OpenACC version), “i” needs to be set to the last iterations value (n), thus causing a dependency where the loop needs to be executed sequentially. Now the user can certainly explicitly add a “private(i)” thus allowing the loop to be run in parallel, but the code will no longer get consistent answers. Though at least they will know why.

Thanks Mat. I guess now i get your point…

Basically you are saying —> If the user DOES NOT give any explicit data-clause, THEN the behavior/output with/without OpenACC should be same (or at least the user should be warned).

Ok fine.
If the above requirement is justifiable, the PGI compiler is doing a good job…

arun