OpenMP 4.5 and Fortran Type bound procedure

Dear,

Please find this modern Fortran program below with GPU offload using OpenMP

Using type bound procedure for my unk derived type generates a Segmentation fault (core dumped)

nvfortran --version
nvfortran 23.5-0 linuxarm64 target on aarch64 Linux -tp neoverse-n1
nvfortran -o exe -Minfo=mp -mp=gpu test.f90

If I break the type bound procedure and use classical SUBROUTINE with type(unk) as argument, it works
But I need to define map(to: sw, sw%h, sw%u ... and I don’t understand why sw is needed
If not, there is this error message:

Failing in Thread:1
Accelerator Fatal Error: call to cuMemcpyDtoHAsync returned error 700: Illegal address during kernel execution

Thanks very much for your answers :)

MODULE m_sw

   USE iso_fortran_env, only: ip => INT32, rp => REAL64

   implicit none

   TYPE :: unk

      real(rp), allocatable :: h(:)
      real(rp), allocatable :: u(:)
      real(rp), allocatable :: v(:)

   CONTAINS

      procedure, pass(sw) :: init => init_sw
      procedure, pass(sw) :: run => run_sw

   END TYPE unk

CONTAINS

   SUBROUTINE init_sw(sw, n)

      class(unk), intent(inout) :: sw
      integer(ip), intent(in) :: n

      allocate (sw%h(n), source=1._rp)
      allocate (sw%u(n), source=0._rp)
      allocate (sw%v(n), source=0._rp)

      !$omp target enter data map(to: sw, sw%h, sw%u, sw%v)

   END SUBROUTINE

   SUBROUTINE run_sw(sw, n, nt)

      class(unk), intent(inout) :: sw
      integer(ip), intent(in) :: n , nt

      integer(ip) :: ite , ic

      do ite = 1,nt
         !$omp target teams distribute parallel do
         do ic = 2,n-1
            sw%h(ic) = sw%h(ic) - ( sw%h(ic+1) * sw%u(ic+1) )
         end do
      end do
      !$omp exit target data map(delete: sw, sw%h, sw%u, sw%v)

   END SUBROUTINE

END MODULE m_sw

PROGRAM test

   USE m_sw

   implicit none

   type(unk) :: sw

   integer(ip), parameter :: n = 100
   integer(ip), parameter :: nt = 1000

   call sw%init( sw , n )

   call sw%run( sw , n , nt )

END PROGRAM test

Hi fcouderc, thanks for the report!

I was able to reproduce the error so filed TPR #33957 and sent it to engineering for review.

Note that the code doesn’t compile as is and needed a few fixes. Also, the loop has a forward dependency so not parallelizable, but that’s not the cause of the segv. The segv still occurs after I fix this.

One possible work around is to use OpenACC, which seems to work correctly. Here’s my modified code:

% cat test.F90
MODULE m_sw

   USE iso_fortran_env, only: ip => INT32, rp => REAL64

   implicit none

   TYPE :: unk

      real(rp), allocatable :: h(:)
      real(rp), allocatable :: hold(:)
      real(rp), allocatable :: u(:)
      real(rp), allocatable :: v(:)

   CONTAINS

      procedure, pass(sw) :: init => init_sw
      procedure, pass(sw) :: run => run_sw

   END TYPE unk

CONTAINS

   SUBROUTINE init_sw(sw, n)

      class(unk), intent(inout) :: sw
      integer(ip), intent(in) :: n

      allocate (sw%h(n), source=1._rp)
      allocate (sw%hold(n), source=1._rp)
      allocate (sw%u(n), source=4._rp)
      allocate (sw%v(n), source=3._rp)

   END SUBROUTINE

   SUBROUTINE run_sw(sw, n, nt)

      class(unk), intent(inout) :: sw
      integer(ip), intent(in) :: n , nt

      integer(ip) :: ite , ic
!$omp target data map(to:sw,sw%u) map(tofrom:sw%h) map(alloc:sw%hold)
!$acc data copyin(sw,sw%u) copy(sw%h) create(sw%hold)
      do ite = 1,nt
         sw%hold=sw%h
!$omp target update to(sw%hold)
!$acc update device(sw%hold)
         !$omp target teams loop
         !$acc parallel loop
         do ic = 2,n-1
            sw%h(ic) = sw%h(ic) - ( sw%hold(ic+1) * sw%u(ic+1) )
         end do
      end do
!$omp end target data
!$acc end data

   END SUBROUTINE

END MODULE m_sw

PROGRAM test

   USE m_sw

   implicit none

   type(unk) :: sw

   integer(ip), parameter :: n = 100
   integer(ip), parameter :: nt = 1

   !call init_sw( sw , n )
   call sw%init(  n )

   !call run_sw( sw , n , nt )
   call sw%run( n , nt )
   print *, sw%h(1:10)

END PROGRAM test
% nvfortran test.F90 -mp=gpu  ; a.out
Segmentation fault (core dumped)
% nvfortran test.F90 -acc=gpu ; a.out
    1.000000000000000        -3.000000000000000        -3.000000000000000
   -3.000000000000000        -3.000000000000000        -3.000000000000000
   -3.000000000000000        -3.000000000000000        -3.000000000000000
   -3.000000000000000

-Mat

1 Like