Hello,
There is a function in my OpenMP loop that probably should be inlined at higher optimization levels. If I leave it to the nvfortran compiler to decide then at 02 or higher a NaN appears. However if I add a -Minline
flag then there is no issue. I am compiling with -Minit-real=snan
so I believe something with the shared memory has gone wrong which is introducing the NaN. I am an OpenMP novice so there is a good chance the error is mine.
I have created the following minimum working example to show the problem.
module my_mod
integer, parameter :: realType = 8
contains
subroutine outer_loop()
real(kind=realType), dimension(2) :: my_array
real(kind=realType) :: arr_elem, x, y, sqrt_x, z
integer :: j
my_array = [1.0, 2.0]
!$omp simd private(j, x, arr_elem, y, sqrt_x, z)
do j=1, 2
write(*,*) 'j', j
arr_elem = my_array(j)
! uncomment this and comment out `call my_inline_func` to manually inline
! x = arr_elem/2.1_REALTYPE
call my_inline_func(arr_elem, x)
! do some math with the z
sqrt_x = sqrt(x)
y = (sqrt_x)/(sqrt_x + 0.1_realType)
z = y*y - 0.1_realType/y
print*, 'arr_elem', arr_elem
print*, 'x', x
print*, 'sqrt_x', sqrt_x
print*, 'y', y
print*, 'z', z
print*, ''
end do
contains
subroutine my_inline_func(a, b)
real(kind=realType), intent(in) :: a
real(kind=realType), intent(out) :: b
b = a/2.1_realType
end subroutine my_inline_func
end subroutine outer_loop
end module my_mod
program main
use my_mod
call outer_loop()
end
These ways of compiling work
- adding inlining explicitly
nvfortran -O3 -mp -Minit-real=snan -Minline my_omp_reproducer.F90 -o omp_test
- lower the optimization level
nvfortran -O1 -mp -Minit-real=snan my_omp_reproducer.F90 -o omp_test
nvfortran -O0 -mp -Minit-real=snan my_omp_reproducer.F90 -o omp_test
- remove openmp
nvfortran -O3 -Minit-real=snan my_omp_reproducer.F90 -o omp_test
- manually inline
comment outcall my_inline_func(arr_elem, x)
and replace withx = arr_elem/2.1_REALTYPE
These ways of compiling do not work
- rely on default optimization options
nvfortran -O3 -mp -Minit-real=snan my_omp_reproducer.F90 -o omp_test
nvfortran -O2 -mp -Minit-real=snan my_omp_reproducer.F90 -o omp_test
I am using NVHPC 24.1 with an Intel(R) Xeon(R) Gold 6152 CPU.
Thanks!