I work in the CINECA User Support Team, and we received from one of our users a report on a code which produces wrong numerical results.Please note that we found how to change the code for it to produce correct results, but we actually wonder if the original code should work as well or, better, how to explain why it is wrong.
Below you find the original Fortran openACC code posted by our user, and how he compiles it. It shows that,:
- when using the vector clause for the parallel loop directive
- if the loop instructions involve some operation on the array returned by a function call
- as a result all threads produce the same value (the one corresponding to i=1), see output array a.
The code also reports the (correct) output for the array b, obtained by saving the return value of get_array in a local (array) variable c, and then operating on c.
- without the vector clause it works
- with acc kernels instead of parallel loop it works
- with the vector clause and scalar variables (replacing the array get_arr) it works
Is this an expected behaviour?
Many thanks in advance for any suggestion you may have,
type ! compile with: ! nvfortran -c -o test.o test.F90 -cuda -acc -gpu=cc70 -Minfo=accel -g -r8 -traceback -Mnoinline ! nvfortran -o test test.o -cuda -acc -gpu=cc70 -Minfo=accel -g -r8 -traceback -Mnoinline module simple contains function get_arr(a) !$acc routine seq integer, dimension(2) :: get_arr integer, intent(in) :: a get_arr(1) = a get_arr(2) = a end function get_arr end module simple program testprogram use simple implicit none integer, parameter :: n = 16 integer, dimension(n) :: a, b integer, dimension(2) :: c integer :: i write (*,*) "test start" !$acc parallel loop gang worker vector & !$acc private(c, i) copyout(a, b) do i = 1, n c = get_arr(i) * 2 a(i) = c(1) c = get_arr(i) b(i) = c(1) * 2 end do write (*,*) "result" write (*,*) "a" write (*,*) a write (*,*) "b" write (*,*) b end program testprogram or paste code here