Dear developers,
I am parallelizing one of my Fortran software using Openacc. Most of the sections are working quite well, except for one section where I called several subroutine within a parallel region, I found NaN return from the call. To illustrate this idea, I will post a light-weighted version of my software for reference
MODULE TVD_MODULE
USE DEFINITION
IMPLICIT NONE
contains
SUBROUTINE TVDVL_reconx
USE RIEMANN_MODULE
USE DEFINITION
IMPLICIT NONE
! Integer !
INTEGER :: j, k, l, i, p
!$ACC ROUTINE (EOSEPSILON_NM) SEQ
!$ACC ROUTINE (EOSSOUNDSPEED) SEQ
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! We first interpolate density !
!$OMP PARALLEL
!$OMP DO COLLAPSE(4) SCHEDULE (STATIC)
!$ACC PARALLEL DEFAULT (PRESENT)
!$ACC LOOP GANG WORKER VECTOR COLLAPSE(4)
DO l = nz_min_2 - 1, nz_part_2 + 1
DO k = ny_min_2 - 1, ny_part_2 + 1
DO j = nx_min_2 - 1, nx_part_2 + 1
DO i = imin2, imax2
CALL TVD_VL (x2cen(j-1), x2cen(j), x2cen(j+1), xF2(j-1), xF2(j), dx2(j), &
prim2(i,j-1,k,l), prim2(i,j,k,l), prim2(i,j+1,k,l), &
primR2(i,j-1,k,l), primL2(i,j,k,l))
END DO
END DO
END DO
END DO
!$OMP END DO
! Get energy and sound speed !
!$OMP DO COLLAPSE(3) SCHEDULE (STATIC)
!$ACC LOOP GANG WORKER VECTOR COLLAPSE(3)
DO l = nz_min_2 - 1, nz_part_2 + 1
DO k = ny_min_2 - 1, ny_part_2 + 1
DO j = nx_min_2 - 1, nx_part_2
CALL EOSEPSILON_NM (primR2(irho2,j,k,l), primR2(itau2,j,k,l), eps2R(j,k,l))
CALL EOSEPSILON_NM (primL2(irho2,j,k,l), primL2(itau2,j,k,l), eps2L(j,k,l))
CALL EOSSOUNDSPEED (primR2(itau2,j,k,l), primR2(irho2,j,k,l), eps2R(j,k,l), cs2R(j,k,l))
CALL EOSSOUNDSPEED (primL2(itau2,j,k,l), primL2(irho2,j,k,l), eps2L(j,k,l), cs2L(j,k,l))
END DO
END DO
END DO
!$OMP END DO
END IF
!$ACC END PARALLEL
!$OMP END PARALLEL
!$ACC SERIAL
WRITE (*,*) primL2(itau2,25,49,0), primL2(irho2,25,49,0), eps2L(25,49,0), cs2L(25,49,0), 'left 0'
WRITE (*,*) primL2(itau2,25,49,1), primL2(irho2,25,49,1), eps2L(25,49,1), cs2L(25,49,1), 'left 1'
WRITE (*,*) primL2(itau2,25,49,2), primL2(irho2,25,49,2), eps2L(25,49,2), cs2L(25,49,2), 'left 2'
WRITE (*,*) primR2(itau2,25,49,0), primR2(irho2,25,49,0), eps2R(25,49,0), cs2R(25,49,0), 'right 0'
WRITE (*,*) primR2(itau2,25,49,1), primR2(irho2,25,49,1), eps2R(25,49,1), cs2R(25,49,1), 'right 1'
WRITE (*,*) primR2(itau2,25,49,2), primR2(irho2,25,49,2), eps2R(25,49,2), cs2R(25,49,2), 'right 2'
WRITE (*,*)
!$ACC END SERIAL
STOP
END SUBROUTINE
END MODULE
So all arrays are properly transferred to the device. The ACC SERIAL region is for me to check if there are unreasonable output from the loop. I find that eps2L(25,49,1), cs2L(25,49,1), eps2R(25,49,1), cs2R(25,49,1) are all NaNs. I pin-point the source of NaN is from the subroutine EOSSOUNDSPEED and EOSEPSILON_NM, let me show them here
SUBROUTINE EOSSOUNDSPEED (p_in, rho_in, eps_in, cs_out)
!$ACC ROUTINE SEQ
USE DEFINITION
IMPLICIT NONE
! Input density !
REAL*8, INTENT (IN) :: p_in, rho_in, eps_in
! Output value !
REAL*8, INTENT (OUT) :: cs_out
! Local variables !
REAL*8 :: dpdden, dpdeps
dpdden = eps_in * (ggas2 - 1.0D0)
dpdeps = rho_in * (ggas2 - 1.0D0)
cs_out = sqrt(dpdden+dpdeps*p_in/rho_in**(2.0D0))
END SUBROUTINE
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
SUBROUTINE EOSEPSILON_NM (rho_in, p_in, eps_out)
!$ACC ROUTINE SEQ
USE DEFINITION
IMPLICIT NONE
! Input density !
REAL*8, INTENT (IN) :: rho_in, p_in
! Output value !
REAL*8, INTENT (OUT) :: eps_out
eps_out = p_in/rho_in/(ggas2 - 1.0D0)
END SUBROUTINE
ggas2 is a variable from the definition module and has been properly declared using !$ACC DECLARE before the start of the program. Also, note that EOSSOUNDSPEED and EOSEPSILON_NM are not in the tvd_module but are in another separate file. I try a number of combinations to look for where the issue is.
- most trivial solution, mute the CALL EOSSOUNDSPEED and CALL EOSEPSILON_NM statement - successful
- move the EOSSOUNDSPEED and EOSSOUNDSPEED subroutine to the tvd_module - fail
- mute anything related to ggas2, i.e. eps_out = p_in/rho_in!/(ggas2 - 1.0D0) - fail
- print out the value of eps_out and cs_out in both subroutines - successful (huh???)
- replace all division operators in EOSSOUNDSPEED and EOSSOUNDSPEED with the multiplication operator - sucessful
I produce two successful cases, but the pattern is very non-trivial. Any clue about what is happening? I am using HPC_SDK 22.3, and built a hdf5 fortran (h5fc) compiler using the pgi compiler. The program is compiled with the corresponding h5fc compiler.