OpenACC + GPU: Attempt to call method of allocated derived type class: "Call to NVHPC runtime function not supported - pgf90_test_and_set_type_i8"

marshall.ward · October 4, 2024, 7:21pm

I am trying to use a method in an allocated derived type, declared as class, but am getting a compile-time error from nvfortran when OpenACC is enabled.

The example code is an abstract type which contains the method:

module EOS_base_mod
implicit none

type, abstract :: EOS_base
  contains
  procedure :: calc_density_array
end type EOS_base

contains

subroutine calc_density_array(this)
  class(EOS_base), intent(in) :: this
end subroutine calc_density_array

end module EOS_base_mod

and a wrapper type which contains the allocatable class and calls its method

module EOS
use EOS_base_mod, only : EOS_base

implicit none

type :: EOS_type
  class(EOS_base), allocatable :: type
end type

contains

subroutine calculate_density_1d(EOS)
  !$acc routine
  type(EOS_type), intent(in) :: EOS

  call EOS%type%calc_density_array()
end subroutine calculate_density_1d

end module EOS

If I try to compile EOS.o with nvfortran -acc, then I get the following error:

NVFORTRAN-S-1058-Call to NVHPC runtime function not supported - pgf90_test_and_set_type_i8 (EOS.F90: 16)

Is this unsupported? What is the error referring to here?

If I compile without -acc then there is no issue.

The source code is also available here: GitHub - marshallward/alloc-method-call-mre

Should have included my compiler version:

$ nvfortran --version

nvfortran 24.5-0 64-bit target on x86-64 Linux -tp znver2
NVIDIA Compilers and Tools
Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

MatColgrove · October 4, 2024, 10:14pm

Using abstract types aren’t supported in device code. This requires a runtime lookup to create the late binding to the specific routine call. Device routines calls need to be statically linked.

The error saying that there’s no device callable version of the compiler runtime routine that’s doing the type association.

marshall.ward · October 5, 2024, 1:50pm

Thanks for the explanation. Is there any chance that they could be supported in the future? Or is there some limitation here making it impossible?

It also sounds like this is not unique to Fortran. Would this extend to other OO languages?

Also, if I remove the abstractions (remove the abstract keyword and replace class(EOS_type) with type(EOS)), then I get the following error:

$ make
nvfortran -acc -Minfo -c EOS_base_mod.F90
nvfortran -acc -Minfo -c EOS.F90
calculate_density_1d:
     13, Generating acc routine seq
         Generating NVIDIA GPU code
nvvmCompileProgram error 9: NVVM_ERROR_COMPILATION.
Error: /tmp/nvaccFZkZeDTYXxiOS.gpu (31, 45): parse use of undefined value '@eos_base_mod$$$eos_base$$td'
NVFORTRAN-W-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (EOS.F90: 13)
make: *** [Makefile:5: EOS.o] Error 2

The un-abstracted files are shown below.

module EOS_base_mod
implicit none

type :: EOS_base
  contains
  procedure :: calc_density_array
end type EOS_base

contains

subroutine calc_density_array(this)
  class(EOS_base), intent(in) :: this
end subroutine calc_density_array

module EOS
use EOS_base_mod, only : EOS_base

implicit none

type :: EOS_type
  type(EOS_base) :: etype
end type

contains

subroutine calculate_density_1d(EOS)
  !$acc routine
  type(EOS_type), intent(in) :: EOS

  call EOS%etype%calc_density_array()
end subroutine calculate_density_1d

end module EOS

MatColgrove · October 7, 2024, 4:41pm

I added your example code to an RFE (TPR #36623) to see if we can add support for classes in OpenACC code.

Is there any chance that they could be supported in the future? Or is there some limitation here making it impossible?

It’s a long standing limitation, but there has been some progress, though mostly on the C++ side with virtual function using STDPAR. Fortran type bound procedures are planned, but not until some time after our new Fortran compiler, based on LLVM’s F18 project, is released.

marshall.ward · October 7, 2024, 5:41pm

Thank you for adding this, and thanks to the engineers for continuing to explore this topic. We rely on these abstractions, and it could be difficult to undo them.

One other question: When you say this,

does this refer to the example in my second post, with the apparent parsing error? Should we just assume there is no support for type-bound procedures on the GPU?

MatColgrove · October 7, 2024, 7:25pm

I was more responding to your question about future support.

Should we just assume there is no support for type-bound procedures on the GPU?

Currently this is correct. However some recent developments have made it possible so we’ll likely look to adding this support at some point (longer term) in the future.

hakostra · October 8, 2024, 6:43am

Any estimates on a timeframe for this? Any beta/pre-release planned? Not using type bound procedures on a device give a lot of restrictions and extra effort in porting code that already use these successfully.

I also ask because I have several, severe problems with nvfortran that prohibit its usage, even on plain x86 host CPU’s. flang-new from LLVM (soon to be renamed flang) is working great without any issues (again - on x86 host architecture - I know offloading is not really much worked on yet). As it is now I doubt that the issues with current nvfortran will be resolved before the “new” Nvidia compiler based on LLVM flang comes out…

MatColgrove · October 8, 2024, 6:13pm

I can’t share specific timelines, but the initial goal for GPU offload in the new nvfortran would be feature parity. So while it’s possible type-bound procedures support in device code is there on day one, I wouldn’t count on it. More likely it would come later, assuming the technical challenges can be overcome.

Wish I could be more specific, but given things are still in development and fluid, I can’t.

If flang-new works for you, I’d recommend using it. You wont be able to access the offload features, but will get the new Fortran language features.

ivan.pribec1 · October 10, 2024, 3:54pm

Isn’t it feasible to just devirtualize (restore) the concrete type for the paths you need? In your original snippet calculate_density_1d isn’t polymorphic.

For instance like this: Compiler Explorer (nvfortran is now also available in CE!)

module EOS_base_mod
implicit none

type, abstract :: EOS_base
contains
  procedure :: calc_density_array
end type EOS_base

contains

subroutine calc_density_array(this)
  class(EOS_base), intent(in) :: this
end subroutine calc_density_array

end module EOS_base_mod

module EOS_child_mod
use EOS_base_mod, only: EOS_base
implicit none

type, extends(EOS_base) :: EOS_child
contains
  procedure :: calc_density_array => calc_density_array_host
end type

contains
   ! Default host procedure
   subroutine calc_density_array_host(this)
      class(EOS_child), intent(in) :: this
   end subroutine

   ! An ACC version is available too
   subroutine calc_density_array_acc(this)
      !$acc routine
      type(EOS_child), intent(in) :: this
   end subroutine

end module


module EOS
use EOS_base_mod, only : EOS_base

implicit none

type :: EOS_type
  class(EOS_base), allocatable :: type
end type

logical, parameter :: use_acc = .true.

contains

subroutine calculate_density_1d(EOS)
    use EOS_child_mod, only: EOS_child, calc_density_array_acc
    type(EOS_type), intent(in) :: EOS

    select type (tmp => EOS%type)
    type is (EOS_child)
        if (use_acc) then
            call calc_density_array_acc(tmp)
        else
            call tmp%calc_density_array()
        end if
    class default
        call tmp%calc_density_array
    end select
end subroutine calculate_density_1d

end module EOS

Depending on what you are trying to achieve, it may make more sense to devirtualize in the calling scope:

program EOS_main

use EOS, only: EOS_type, calculate_density_1d, use_acc
use EOS_child_mod, only: EOS_child, calc_density_array_gpu
implicit none

type(EOS_type) :: top

! top%type could be set by a factory function
allocate(EOS_child :: top%type)

select type(type => top%type)
type is (EOS_child)
  ! We know EOS_child has an OpenACC variant
  if (use_acc) then
    !$acc kernels
    call calc_density_array_gpu(type)
    !$acc end kernels
  else
    ! Normal polymorphic path
    call calculate_density_1d(top)
  end if
class default
  ! Normal polymorphic path
  call calculate_density_1d(top)
end select

print *, "done!"

end program

Perhaps even more natural would be to provide an extended child type which allows runtime selection of OpenACC:

type, extends(EOS_base) :: EOS_child
   logical :: use_acc = .false.
contains
   procedure :: calc_density_array => calc_density_array_EOS_child
end type

subroutine calc_density_array_EOS_child(this)
   class(EOS_child), intent(in) :: this
   !$acc kernels if(this%use_acc)
       ! ...
   !$acc end kernels
end subroutine

This would be more in-line with the “depend on abstractions, not concretions” philosophy. In other words, the container EOS_type should not need to be aware of low-level details like what parallel API you are using.

The wrapper calc_density_array could now remain unchanged.

program main
use EOS, only: EOS_type, calculate_density_1d
implicit none

type(EOS_type) :: top

! This block would be replaced with a factory function
block
    use EOS_child_mod, only: EOS_child
    top%type = EOS_child(use_acc=.true.)
end block

call calculate_density_1d(top)
print *, "done!"

end program

Here is a link to CE: Compiler Explorer. This is the least intrusive way because nothing needs to change in your type hierarchy.

marshall.ward · October 10, 2024, 5:37pm

@ivan.pribec1 Thanks for your suggestions. It is very likely that we will move the OpenACC kernel/loop directives into the calculate_density_1d functions, as in your calc_density_array_acc. This will probably require more work, since we must pull these function calls out of existing loops and must precompute the values into new arrays. But we should probably have done this anyway, for performance reasons.

It’s unlikely that we will need to consider versions with and without OpenACC support, but your proposals are interesting and I’ll certainly keep them in mind.

ivan.pribec1 · October 10, 2024, 9:14pm

I believe this type of refactoring is common when adapting to GPUs, i.e. “flattening” the code so that more parallelism can be exposed by collapsing loops and the compiler can “see” the whole loop nest. I’ve seen similar patterns occur in a couple other projects.

There was blog post about refactoring from object-oriented to data-oriented design which involved getting rid of a virtual call and replacing the polymorphic function with an integer tag and a plain function:

Multi-GPU Programming with Standard Parallel C++, Part 1 | NVIDIA Technical Blog

The rules for virtual functions in CUDA C++ are given here. The error/constraint you are encountering seems close to this rule:

It is not allowed to pass as an argument to a __global__ function an object of a class with virtual functions.

I searched for other useful threads on this topic, and one StackOverflow answer had some good advice: How to use polymorphism in CUDA

… , I’d recommend isolating your polymorphism to the host code which launches the kernel.

Even on the CPU, if the same operations are being applied to every element of large vectors, I would consider refactoring so that the polymorphism is at a higher level, and the virtual method calls are not in your inner loops. Once you do that, parallelizing will be much more performant (in CUDA, OpenMP, or whatever).

Returning briefly to your original example, the procedure calc_density_array isn’t deferred (virtual). Is it meant to be overridden at all? If not, couldn’t you just make it a free function instead? This would be in-line with the following (C++) guideline: C.4: Make a function a member only if it needs direct access to the representation of a class. This guideline appears to be derived from the article by Scott Meyer in Dr. Dobb’s journal on How Non-Member Functions Improve Encapsulation.

marshall.ward · October 10, 2024, 9:44pm

the procedure calc_density_array isn’t deferred (virtual). Is it meant to be overridden at all?

This was a minimum reproducible example, only meant to replicate the error. In production, the model supports many equations of state (EOS), each an extension of EOS_base and with its own suite of functions.

Topic		Replies	Views
OpenACC Accelerator restriction: call to 'function' with no acc routine information nvc, nvc++ and nvfortran	9	522	November 26, 2024
Implicit OpenACC copies and full deep copies nvc, nvc++ and nvfortran	9	1458	July 13, 2022
Using classes in openACC nvc, nvc++ and nvfortran	11	736	March 20, 2023
Cannot dynamically load a shared library containing both OpenACC and CUDA code nvc, nvc++ and nvfortran	8	2639	August 24, 2022
Nvfortran error nvc, nvc++ and nvfortran	39	3324	January 17, 2024
The Fortran OpenACC acceleration code compiles successfully but still runs on the CPU nvc, nvc++ and nvfortran	14	32	December 28, 2024
"OpenACC" deepcopy support in current nvc releases? It works, but is it supported? nvc, nvc++ and nvfortran omniverse_extension	9	1118	February 23, 2022
Handling global variables in OpenACC kernels nvc, nvc++ and nvfortran	14	1008	August 14, 2023
Accelerating Fortran DO CONCURRENT with GPUs and the NVIDIA HPC SDK Technical Blog	28	2303	February 25, 2025
Incorrect GPU results with #pragma acc atomic capture nvc, nvc++ and nvfortran	11	1343	February 27, 2023

OpenACC + GPU: Attempt to call method of allocated derived type class: "Call to NVHPC runtime function not supported - pgf90_test_and_set_type_i8"

Related topics