[nvfortran] Compiler bug when targeting multicore CPUs with OpenACC

The following program either produces wrong results or a segmentation fault if compiled with -acc=multicore. The program works correctly when compiled for GPUs with -acc=gpu or when compiled without the -acc flag. Removing the empty if conditional solves the issue. Tested with nvfortran 22.5.

program main
    use zoo
    implicit none

    call elephant
    call redpanda

    WRITE(*, *) '> ', sum(bamboo)
end program main
module zoo
    implicit none

    character(len = 777) :: wildcard
    integer :: ji, jj, jk, jpi, jpj, jpk
    real(kind = 8), allocatable, dimension(:,:), public :: bamboo
    real(kind = 8), allocatable, dimension(:,:,:), public :: grass

    contains
    subroutine elephant
        wildcard = "capybara"

        jpi = 362
        jpj = 332
        jpk = 74

        allocate(bamboo(jpi, jpj), grass(jpi, jpj, jpk))
        bamboo = 0
        grass = 1
    end subroutine elephant

    subroutine redpanda
        !$acc kernels
        do jk = 1, jpk
            bamboo(:,:) = bamboo(:,:) + grass(:,:,jk)
        enddo
        if (wildcard == "redpanda") then
        end if
        !$acc end kernels
    end subroutine redpanda
end module zoo

Thank you 🙂

Hi nmnobre,

I’m debating what to do here. There’s definitely a compiler issue here, but the use case is somewhat invalid given the “jk” loop is not parallelizable, scalar code shouldn’t really be put in a kernels region, and the if statement is empty. I can submit an issue report, but engineering is likely to set the priority to low and it not get fixed for awhile, if at all.

Is this example case derived for a larger application? Would it be acceptable to apply a work around such as moving the “end kernels” after the “end do”?

Thanks,
Mat

Hi Mat,

The “jk” loop is indeed not parallelizable but the implicit inner ones are and -Minfo=accel correctly shows that . The if statement is purportedly empty to, I guess, show the extent of the bug: even with something which could be dead-code eliminated, the compiler fails to do its job. I hope this is enough to convince engineering to set the priority to high 🙂

-Nuno