OpenACC pointer procedure (fortran)

Dear all,

we use pointer procedures for the easy handling of a set of different schemes implemented into a CFD solver, however, we have found issues in compiling our code, see the snippets below. We understand that the offloading of pointer procedures could be currently not supported (at least with nvfortran, gfortran seems to have a workaround), thus we have a couple of questions;

  1. Can pointer procedures not be offloaded on the device or is our approach wrong (see the example below)? In this case, can you drive us to the correct approach to offload pointer procedures?

  2. assuming that pointer procedures offloading is not currently supported, do you have any suggestions on handling multiple schemes implementations other than the (ab)use of preprocessing flags?

Thank you in advance for your help, it is appreciated.
Kind regards,
Stefano

Minimal Working Example

program test_pointer_procedure

use openacc
implicit none

integer, parameter :: ni = 1000
integer            :: i
real               :: a(ni)

procedure(subroutine_template), pointer :: do_work

interface
   subroutine subroutine_template(i, x)
   integer, intent(in)  :: i
   real,    intent(out) :: x
   endsubroutine subroutine_template
endinterface

do_work => do_work_ok
!$acc enter data create(a, do_work)
!$acc parallel loop present(a)
do i=1, ni
   call do_work(i=i, x=a(i))
enddo
!$acc exit data copyout(a) delete(do_work)
print *, ' work ok', maxval(a)
do_work => do_work_ko
!$acc enter data create(a)
!$acc parallel loop present(a, do_work)
do i=1, ni
   call do_work(i=i, x=a(i))
enddo
!$acc exit data copyout(a) delete(do_work)
print *, ' work ko', maxval(a)

contains
   subroutine do_work_ok(i, x)
   integer, intent(in)  :: i
   real,    intent(out) :: x
   !$acc routine(do_work_ok)
   x = real(i)
   endsubroutine do_work_ok

   subroutine do_work_ko(i, x)
   integer, intent(in)  :: i
   real,    intent(out) :: x
   !$acc routine(do_work_ko)
   x = -real(i)
   endsubroutine do_work_ko
endprogram test_pointer_procedure

Compiling with gfrotran (v14.02) and running we got the correct output:

└──────╼ ./test_pointer_procedure
  work ok   1000.00000
  work ko  -1.00000000

Compiling with nvfortan (v24.11-0) we got an ICE

└──────╼ nvfortran -acc -gpu=cc89 -fast -Minfo=all test_pointer_procedure.f90 -o test_pointer_procedure
NVFORTRAN-S-0000-Internal compiler error. size_of: bad dtype       39  (test_pointer_procedure.f90: 36)
NVFORTRAN-W-0155-Data clause needed for exposed use of pointer do_work$sd (test_pointer_procedure.f90: 21)
NVFORTRAN-S-0155-Accelerator region ignored; see -Minfo messages  (test_pointer_procedure.f90: 21)
NVFORTRAN-S-0000-Internal compiler error. size_of: bad dtype       39  (test_pointer_procedure.f90: 20)
NVFORTRAN-S-0000-Internal compiler error. size_of: bad dtype       39  (test_pointer_procedure.f90: 28)
NVFORTRAN-S-0000-Internal compiler error. size_of: bad dtype       39  (test_pointer_procedure.f90: 33)
test_pointer_procedure:
     20, Generating enter data create(do_work,a(:))
     21, Accelerator restriction: size of the GPU copy of do_work$sd is unknown
         Accelerator region ignored
     22, Loop not vectorized/parallelized: contains call
     25, Generating exit data delete(do_work)
         Generating exit data copyout(a(:))
     26, maxval reduction inlined
         Loop not fused: function call before adjacent loop
         Generated vector simd code for the loop containing reductions
     28, Generating enter data create(a(:))
     29, Generating present(do_work,a(:))
         Generating NVIDIA GPU code
         30, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
     30, Loop not vectorized/parallelized: contains call
     33, Generating exit data delete(do_work)
         Generating exit data copyout(a(:))
     34, maxval reduction inlined
         Loop not fused: function call before adjacent loop
         Generated vector simd code for the loop containing reductions
  0 inform,   1 warnings,   5 severes, 0 fatal for test_pointer_procedure
do_work_ok:
     37, Generating acc routine seq
         Generating NVIDIA GPU code
do_work_ko:
     44, Generating acc routine seq
         Generating NVIDIA GPU code
1 Like

Hi Stefano,

Unfortunately, no, we do not support using procedure pointers in device code. Up until recently (CUDA 12.x, but I’ve forgotten exactly which release), NVIDIA didn’t have a dynamic linker so there was no method we could use to support late binding.

Now that it’s possible, we’ll likely revisit adding this support, but I’m not sure when that will be. Our team is in the process of developing a new flang Fortran compiler with the LLVM community. Hence, they’ve not been adding new features to nvfortran. Maybe after the new flang is out, but I’m not sure how high of a priority this will be.

If gfortran does what you need, then that’s great. With the exception of “kernels”, their support for OpenACC has gotten better and makes a good alternative to nvfortran.

-Mat

1 Like

Dear Mat,

thank you very much, you are always very kind.

Kind regards,
Stefano