Strange -O3 optimization result for nvfortran

I was porting an in-house fortran code to OpenACC by HPC SDK and got some strange result for -O3 optimization. The result can be reproduced by the following piece of code:

! bug_test.f90

program main

implicit none

integer i, na

real, allocatable :: w(:), ww(:)

real a

na = 8

allocate(w(na), ww(na))

w = 1.

ww = -1.

!$acc kernels

do i = 1, na

a = w(i)

w(i) = ww(i)

ww(i) = a

enddo

!$acc end kernels

write(*, *) w

write(*, *)

write(*, *) ww

end program

I need to sweep the elements in two arrays, when I compile the code with:

nvfortran -acc -Minfo -r8 -O3 bug_test.f90

The program output shows that both w and ww all “-1”

However, the program works well when compile with -O2 or -O.

The program even shows a right output for -O3 optimization while I targeting -acc=multicore or -acc=host. It seems that the compiler takes a strange optimization strategy for the GPU code. I have to avoid -O3 optimization in my code now.

My OS is Ubuntu 18.04.2 LTS, HPC SDK version is 21.3, cuda version is 11.0.

Best regards

Thanks JieyunPan,

I’ve reproduced the issue here and filed problem report TPR #29984.

-Mat