Nvfortran OPENACC reduction problem/bug

Hello,
in the nvfortran compiler version 24.5 / linux/ there seems a strange behavior with OPENACC when using reductions.

When I explicitly declare a reduction clause for a parallel loop in openacc for several variables, but I forget to declare one additional variable that should be reduced, then the compiler adds an implicit reduction to this extra variable, but the declared reduction variables get wrong. It forgets some reduction variables (seems it only takes every second one). The implicit reduction variable should also have a copy clause, but it uses firstprivate. Here is a small fortran program you can follow it, looking at the Minfo=accel output: (I compiled it with mpifort -O2 -g -r8 -Minfo=accel -acc=gpu -gpu=cc70,lineinfo,nomanaged test.f90)

Some more questions:

  1. I wonder if one can also explicitely tell openacc not to do a reduction in a certain variable ?!
  2. if one uses an array e.g. a(3), then I have to explicitely to copyin this array to make reduction,
    since openacc will not implicitely copyin an array, while it implicitely will do it for scalar values,
    is that correct ?! Sometimes one need array elements, e.g. if I have a vector with three components
    in the x,y,z direction.

Thanks, Frank

program test

real array(1000)
real var1,var2,var3,var4,var5
integer i

do i=1,1000
array(i) = 1.0
end do

var1=0.
var2=0.
var3=0.
var4=0.
var5=0.

!$acc parallel loop &
!$acc reduction(+:var1,var2,var3,var4)
do i=1,1000
var1=var1+1 * array(i)
var2=var2+2 * array(i)
var3=var3+3 * array(i)
var4=var4+4 * array(i)
var5=var5+5 * array(i)
end do
!$acc end parallel loop

end program test

Thank you for highlighting this - I just played with your example, and I’m convinced it’s definitely the wrong behavior. I’m going to open an internal bug report on this issue with compiler engineering and then I’ll respond back w/ the id number for tracking it externally and answers for the rest of your questions.

ok, thanks. I first thought that the number of reductions might be limited, but then it was the problem if one reduction is not mentioned things get mixed up. Frank

Bramkamp,

I’ve reported the issue as TPR#35927. Feel free to follow up later in this thread to ask about status as we go about getting the behavior fixed.

At initial look - the bug to me, as you noticed, is that having the implicit reduction caused explicit reductions to be missed/skipped in a strange order. That behavior is definitely incorrect. Whether OpenACC should pick up the auto reduction pattern on the unlisted variable and help the user out is something for the compiler and standard to decide - which is higher than my level, for sure.

However, if you want to avoid var5 being reduced, you can tag it as a firstprivate variable in the parallel loop clause. This has the added benefit of fixing whatever bug is happening to the var1-var4 reductions when var5 is implicitly reduced. I think this is the best answer I have for you first extra question.

Similarly, if you’re okay with the var5 auto reduction, using the kernels clause instead of parallel actually gets the right answer for reducing all the variables in your example case.

With regards to your second extra question - it’s best practice to explicitly manage your data if you’re outside managed memory cases. However, the compiler tries its best to help you, when it can. For example - you didn’t explicitly copy the variable “array” into your parallel region, but the compiler implicitly added a “copyin” clause. If you were to update “array”, the compiler shifts that to an implicit “copy” clause. You can see all this data by closely reading the “-Minfo=acc” data output at compile time.

Of course - for complicated parallel regions/kernel regions or sophisticated code - the compiler can make mistakes and miss things that would benefit you. This is why it’s best for you to explicitly manage the data yourself, if you’re not using managed memory mode.

Does that help answer your questions?

Thanks for the answer. If one does not list any reduction clauses, it also generates implicit reduction clauses that are correct. But here I wanted mentioning the reductions explicitely. When I missed one variable, it broke. In my real code I also use own memory management. Just in the example I skipped it.

Greetings, Frank

Thanks for letting me understand your use case a little better! I didn’t even test leaving out the reduction clause entirely. I’ll update the bug report with that information!

Cheers,

Seth.

This also maps to NVBUG ID 4709303 . We will triage the bug and bring back conclusion to public .