Bug in nvfortran 22.1-0 64-bit target on x86-64 Linux -tp zen3

I have an OpenACC Fortran code that uses two modules. If the modules are in the same file, it works fine. But if the modules are in separate files, it yields the wrong answer. I show the original code below.

There are two modules, heatx_mod and test_mod. Both are in the file par.f90. If you put test_mod in a different file, say test.f90, then the code yields the wrong answer. And if you put everything in submodules, it fails too:

heatx.f90

!--------------------------------------
!                                     |
!--------------------------------------
        program heatx
        use heatx_mod
        use test_mod
        !$acc routine(calc) seq
        allocate(x(n,n,n))
        x=4.0
        xtot=0.0
        !$acc update device(x,xtot)

        call test()
        
        !$acc update host(x,xtot)
        print*,'+------------------------------------------+'
        print*,'|                                          |'
        print*,'+------------------------------------------+'
        print'(1x,a,i1,2f15.7)','%heatx, avg  10^',int(log10(float(size(x)))),sum(x)/size(x),xtot/size(x)
        print*
        end program

par.f90

!--------------------------------------
!                                     |
!--------------------------------------
        module heatx_mod
        integer, parameter   :: n=100
        real   , allocatable :: x(:,:,:)
        real                 :: xtot
        !$acc declare create(x,xtot)
        
        contains
!--------------------------------------
!                                     |
!--------------------------------------
        real function sqab(a)
        !$acc routine seq
        real :: a
        sqab = sqrt(abs(a))
        end function
!--------------------------------------
!                                     |
!--------------------------------------
        subroutine calc(i,j,k)
        integer :: i,j,k
        !$acc routine(sqab) seq
        !$acc routine seq
        x(i,j,k) = sqab(x(i,j,k))
        xtot = xtot + x(i,j,k)
        end subroutine

        end module

!--------------------------------------
!                                     |
!--------------------------------------
        module test_mod
        contains
        subroutine test()
        use heatx_mod
        !$acc routine(calc) seq
        !$acc parallel loop reduction(+:xtot)
        do k=1,n
        !$acc loop reduction(+:xtot)
        do j=1,n
        !$acc loop reduction(+:xtot)
        do i=1,n
                call calc(i,j,k)
        end do
        end do
        end do
        end subroutine

        end module

Hi Robert,

It’s not a compiler bug but an issue with your program.

The problem is that “xtot” within “calc” will access the global device variable thus causing a race condition. The reduction is in a separate program unit so the compiler can’t associate the two. It will only work if “calc” gets inline and hence the compiler can make the association. The back-end device code generator will implicit inline device routines when they are in the same file as the compute kernel, i…e. the “parallel loop”, and why it works in this case.

You can use the “-Mextract”/“-Minline” flags so “calc” is inlined, but it may be better to pass “xtot” as an argument to “calc” so each thread’s private partial reduction “xtot” variable is passed in.

Example of using two-pass inlining for multiple files. First extract the inline info, then use the info to inline:

% nvfortran -c par.f90 -acc -Mextract=lib:inlib
% nvfortran -c heatx.f90 -acc -Mextract=lib:inlib
% nvfortran -c par.f90 -acc -Minline=lib:inlib
% nvfortran -c heatx.f90 -acc -Minline=lib:inlib
% nvfortran -acc heatx.o par.o
% a.out
 +------------------------------------------+
 |                                          |
 +------------------------------------------+
 %heatx, avg  10^6      2.0000000      2.0000000

Hope this helps,
Mat

Mat,

Thanks for your quick response!!! I see where my error was.

This code is a toy code for the 3D Jacobi Solver, which was a toy code for my larger FAST3D code. The -Mextract/inline worked for the Jacobi Solver, but not for FAST3D, even without the OpenACC directives, it gives the wrong answer. So just now, for FAST3D, I placed all the subroutines in the main file – so I only have one file to compile. And that gives an Internal Compiler Error. So I am debugging further. -Rob