OpenMP thread variables not set properly after inlining with 02 optimization or higher with nvfortran

Hello,

There is a function in my OpenMP loop that probably should be inlined at higher optimization levels. If I leave it to the nvfortran compiler to decide then at 02 or higher a NaN appears. However if I add a -Minline flag then there is no issue. I am compiling with -Minit-real=snan so I believe something with the shared memory has gone wrong which is introducing the NaN. I am an OpenMP novice so there is a good chance the error is mine.

I have created the following minimum working example to show the problem.

module my_mod
    integer, parameter :: realType = 8
contains
    
 subroutine outer_loop()
 
  real(kind=realType), dimension(2) :: my_array
  real(kind=realType) :: arr_elem, x, y, sqrt_x, z
  integer :: j

  my_array = [1.0, 2.0] 
  !$omp simd private(j, x, arr_elem, y, sqrt_x, z) 
  do j=1, 2

    write(*,*) 'j', j
    arr_elem = my_array(j)

    ! uncomment this and comment out `call my_inline_func` to manually inline
    ! x = arr_elem/2.1_REALTYPE
    call my_inline_func(arr_elem, x)
    
    ! do some math with the z
    sqrt_x = sqrt(x)
    y = (sqrt_x)/(sqrt_x + 0.1_realType)
    z = y*y  - 0.1_realType/y
    print*, 'arr_elem', arr_elem
    print*, 'x', x
    print*, 'sqrt_x', sqrt_x
    print*, 'y', y
    print*, 'z', z
    print*, ''
    
  end do
contains

  subroutine my_inline_func(a, b)
    real(kind=realType), intent(in) :: a
    real(kind=realType), intent(out) ::  b
    b = a/2.1_realType
  end subroutine my_inline_func

end subroutine outer_loop

end module my_mod

program main
    use my_mod
    call outer_loop()
end

These ways of compiling work

  1. adding inlining explicitly
    nvfortran -O3 -mp -Minit-real=snan -Minline my_omp_reproducer.F90 -o omp_test
  2. lower the optimization level
    nvfortran -O1 -mp -Minit-real=snan my_omp_reproducer.F90 -o omp_test
    nvfortran -O0 -mp -Minit-real=snan my_omp_reproducer.F90 -o omp_test
  3. remove openmp
    nvfortran -O3 -Minit-real=snan my_omp_reproducer.F90 -o omp_test
  4. manually inline
    comment out call my_inline_func(arr_elem, x) and replace with x = arr_elem/2.1_REALTYPE

These ways of compiling do not work

  1. rely on default optimization options
    nvfortran -O3 -mp -Minit-real=snan my_omp_reproducer.F90 -o omp_test
    nvfortran -O2 -mp -Minit-real=snan my_omp_reproducer.F90 -o omp_test

I am using NVHPC 24.1 with an Intel(R) Xeon(R) Gold 6152 CPU.

Thanks!

Thanks for the report!

I looks to me to be a problem with the “simd” directive when there’s a subroutine call in the region that gets passed private variables.

I’ve created a problem report, TPR#35918, and send it to engineering for investigation.

Besides ininling, other work around would be to not use “simd” and instead use “parallel do”. The compiler will still be able to vectorize the code without the hint (though the loop trip count is too short to vectorize in this particular example).

Another option is to make the subroutine a function so the private variable doesn’t need to get passed:

...
    x = my_inline_func(arr_elem)
...
contains

  function my_inline_func(a) result(b)
    real(kind=realType), value :: a
    real(kind=realType) ::  b
    b = a/2.1_realType
  end function my_inline_func
...
1 Like