Wrong results when using vector clause in parallel loop with array syntax

i.baccarelli · February 10, 2022, 3:34pm

Hello,
I work in the CINECA User Support Team, and we received from one of our users a report on a code which produces wrong numerical results.Please note that we found how to change the code for it to produce correct results, but we actually wonder if the original code should work as well or, better, how to explain why it is wrong.
Below you find the original Fortran openACC code posted by our user, and how he compiles it. It shows that,:

when using the vector clause for the parallel loop directive
if the loop instructions involve some operation on the array returned by a function call
as a result all threads produce the same value (the one corresponding to i=1), see output array a.
The code also reports the (correct) output for the array b, obtained by saving the return value of get_array in a local (array) variable c, and then operating on c.
Note that:
without the vector clause it works
with acc kernels instead of parallel loop it works
with the vector clause and scalar variables (replacing the array get_arr) it works
Is this an expected behaviour?
Many thanks in advance for any suggestion you may have,
best
Isabella

type ! compile with: 
! nvfortran -c -o test.o test.F90 -cuda -acc -gpu=cc70 -Minfo=accel -g -r8 -traceback -Mnoinline
! nvfortran -o test test.o -cuda -acc -gpu=cc70 -Minfo=accel -g -r8 -traceback -Mnoinline

module simple
contains
  function get_arr(a)
  !$acc routine seq
    integer, dimension(2) :: get_arr
    integer, intent(in) :: a
    get_arr(1) = a
    get_arr(2) = a
  end function get_arr
end module simple


program testprogram
  use simple
  implicit none
  integer, parameter :: n = 16
  integer, dimension(n) :: a, b 
  integer, dimension(2) :: c
  integer :: i

  write (*,*) "test start"

  !$acc parallel loop gang worker vector &
  !$acc           private(c, i) copyout(a, b)
  do i = 1, n
    c = get_arr(i) * 2
    a(i) = c(1)
    c = get_arr(i)
    b(i) = c(1) * 2
  end do

  write (*,*) "result"
  write (*,*) "a"
  write (*,*)  a
  write (*,*) "b"
  write (*,*)  b
end program testprogram
or paste code here

MatColgrove · February 10, 2022, 6:00pm

Hi i.baccarelli,

I suspect the key as to what’s going on is found in the compiler feedback messages (-Minfo=accel):

% nvfortran test.F90 -acc -Minfo=accel -g ; a.out
get_arr:
      3, Generating acc routine seq
         Generating NVIDIA GPU code
testprogram:
     26, Generating copyout(a(:)) [if not already present]
         Generating NVIDIA GPU code
         29, !$acc loop gang, worker(4), vector(32) ! blockidx%x threadidx%y threadidx%x
         30, !$acc loop seq
     26, Local memory used for c
         Generating implicit copy(get_arr1(:)) [if not already present]
         Generating copyout(b(:)) [if not already present]
     30, Loop is parallelizable

It looks like the fixed sized local array “get_arr” is getting hoisted causing the compiler to implicitly copy it back to the device. Since it’s not also being implicitly privatized, it’s causing a potential race condition.

The wrong answers do seem to only appear when “-g” is used (another workaround is to remove -g), but if I’m correct and it is a race condition, the successful cases may just be due to luck in the timing of when get_arr is used.

I’ll need a compiler engineer to dig into the details to confirm if I’m correct, or if something else is going on. Hence, I added a problem report, TPR #31360.

Thanks for the report,
Mat

i.baccarelli · September 26, 2022, 10:10am

Dear Mat,
could you get some news on your hypothesis on TPR #31360?
thank you for all your help,
Isabella

MatColgrove · September 26, 2022, 3:21pm

Hi Isabella,

Engineering did take a look and came to the same conclusion that “get_arr” isn’t getting implicitly privatized as it should. However, they gave the task a lower priority so haven’t assigned someone to fix it as of yet. Let me talk to management and see if I can get it bumped higher.

-Mat

i.baccarelli · February 17, 2023, 11:31am

Dear Mat,
I was wondering if you could get any additional feedback on your side on the present issue? Meanwhile I installed the 2023 suite of hpc-sdk, but nothing changed.
Thank you,
Isabella

Topic		Replies	Views
OpenACC routine behavior nvfortran nvc, nvc++ and nvfortran	4	22	April 11, 2025
On the correct array syntax to be used in data clauses nvc, nvc++ and nvfortran	2	723	February 10, 2022
Openacc `vector_length` changes the result nvc, nvc++ and nvfortran	2	35	October 16, 2024
Offloading vector syntax - offloading using plain standard ISO Fortran nvc, nvc++ and nvfortran	1	444	November 24, 2021
Wrong output with scalars and OpenACC nvc, nvc++ and nvfortran	4	27	May 9, 2025
Nvfortran OPENACC reduction problem/bug nvc, nvc++ and nvfortran	6	151	June 27, 2024
Private array in acc loop nvc, nvc++ and nvfortran kernel	11	1275	December 14, 2020
Addition with std::complex constructor in openacc parallel loop produces incorrect array value nvc, nvc++ and nvfortran	1	516	April 14, 2021
Reduction variables take wrong values inside a loop nvc, nvc++ and nvfortran	4	495	February 7, 2023
Unexpected error when accessing an optional scalar argument inside an OpenACC kernel with `default(present)` nvc, nvc++ and nvfortran	3	20	January 23, 2025

Wrong results when using vector clause in parallel loop with array syntax

Related topics