Compilation with -stdpar=gpu error

I’m trying to compile a fork of MAS (the stdpar branch) using:

./build.sh ./conf/nvidia_gpu_psi.conf

The config file currently contains only these flags:
-O3 -march=native -stdpar=gpu -gpu=ccnative,mem:unified -Minfo=accel
Compilation fails with:

NVFORTRAN-F-0000-Internal compiler error. could not get result type from opc  (mas_cpp.f90)
NVFORTRAN/x86-64 Linux 26.1-0: compilation aborted
make: *** [Makefile:25: mas.o] Error 2

If I additionally add:
-acc=gpu
then it compiles successfully.

This fork is intended to rely on DO CONCURRENT offload via -stdpar=gpu, along with unified memory (-gpu=mem:unified), and does not explicitly use OpenACC directives. So I’m not sure why -acc=gpu is needed.

Why does adding -acc=gpu avoid the internal compiler error? Is there an unexpected dependency between -stdpar=gpu and the OpenACC runtime/toolchain in nvfortran for this large code, or does this point to a compiler bug?

Fork (stdpar branch):

Thanks for any insight.

So working on your other problem, I wonder if this is related to you putting procedure calls in your do concurrent constructs.

There’s a definite connection between Do Concurrent and OpenACC - I believe when DoConcurrent was first implemented, we essentially mapped it to OpenACC constructs. So, there’s still that a connection somewhere in the compiler for portability and code support purposes, even though Do Concurrent is more disconnected now than it originally was.

The interesting part here is that OpenACC does allow procedural calls in the parallel region - so I think the compiler is running into the do concurrent issue and short circuiting to OpenACC because you’re trying to do some Do Concurrent constructs that aren’t allowed. Without digging too deeply into it, that would be my assumption of what’s going on here.

On top of the compiler just choking on so much code being thrown at it, and the resulting error messages being unclear.

If I understand correctly, even though there are no explicit OpenACC directives in the code and DO CONCURRENT does not allow unrestricted procedure calls inside the parallel region, the compiler behaves differently when the -acc flag is enabled.

Is it internally using the OpenACC handling when -acc is present, so that the procedure calls work even though they normally would not under strict DO CONCURRENT rules? In other words, is my code just exposing a compiler quirk?

I was also under the impression that procedure calls inside DO CONCURRENT are allowed if they are PURE. I also thought, -stdpar=gpu automatically enables -acc=gpu under the hood, was this changed at some point?

So I just played around with it, and I believe if you have explicit OpenACC in your code, “-stdpar=gpu” and “-acc=gpu” will lead to that being parsed. However, do concurrent clauses appear to only get parsed by “-stdpar=gpu” - though your example would lead one to believe that if you mix the two “-stdpar=gpu -acc=gpu”, then some things that are only allowed in openacc clauses can get handled in do concurrent structures. Something odd is definitely going on. If you can carve out a more clean example where it occurs, that’d be helpful.

With regards to the procedure - I think it’s complicated. It doesn’t want to handle generic procedures, for example - this fails with that compilation error:

module profile_def
  implicit none

  type :: profile
    logical :: active = .false.
    real    :: f(3) = 1.0
    real    :: x(2) = (/ -1.0e20, 1.0e20 /)
    real    :: w(2) = 1.0
  end type profile

  type :: heat_source
    logical :: active = .false.
    real    :: h0 = 0.0
    type(profile) :: r_profile
  end type heat_source

end module profile_def


module profile_value_interface
  use profile_def, only : profile
  implicit none
  interface
    pure function profile_value(prof, x) result(v)
      import :: profile
      type(profile), intent(in) :: prof
      real,         intent(in) :: x
      real :: v
    end function profile_value
  end interface
end module profile_value_interface


module heating_mod
  use profile_def,            only : heat_source
  use profile_value_interface,only : profile_value
  implicit none
  private
  public :: heating

contains

  subroutine heating
    integer, parameter :: n = 16
    type(heat_source) :: heatsource(1)
    real :: rh_true(n)
    real :: rprof(n)
    integer :: i

    rh_true = [(real(i), i=1,n)]

    ! The call below is the trigger: procedure call inside DO CONCURRENT
    do concurrent (i = 1:n)
      rprof(i) = profile_value(heatsource(1)%r_profile, rh_true(i))
    end do
  end subroutine heating

end module heating_mod

I think because the profile_value is in a different compilation unit (module), whereas this is handled okay since they’re all together and presumably the function can be inlined:

module repro_do_concurrent_proc_call
  implicit none
  private
  public :: heating

  ! Minimal "profile" type
  type :: profile
    logical :: active = .false.
    real    :: f(3) = 1.0
    real    :: x(2) = (/ -1.0e20, 1.0e20 /)
    real    :: w(2) = 1.0
  end type profile

  ! Minimal heat_source type, with a profile component
  type :: heat_source
    logical :: active = .false.
    real    :: h0 = 0.0
    type(profile) :: r_profile
  end type heat_source

  integer, parameter :: max_heat_sources = 2

contains

  ! PURE procedure being called inside DO CONCURRENT
  pure function profile_value(prof, x) result(val)
    type(profile), intent(in) :: prof
    real,         intent(in) :: x
    real :: val

    if (.not. prof%active) then
      val = 1.0
    else
      ! Any simple expression is fine; keep it trivial.
      val = prof%f(1) + x*0.0
    end if
  end function profile_value

  ! Subroutine that triggers the diagnostic at compile time
  subroutine heating
    type(heat_source) :: heatsource(max_heat_sources)
    real :: rh_true(4)
    real :: rprof
    integer :: i, n

    rh_true = (/ 1.0, 2.0, 3.0, 4.0 /)

    ! Key pattern: procedure call inside DO CONCURRENT
    do concurrent (i = 1:4)
      n = 1
      rprof = profile_value(heatsource(n)%r_profile, rh_true(i))
    end do
  end subroutine heating

end module repro_do_concurrent_proc_call

Compiling:

[scamp]$ nvfortran compiles.F90 -c -stdpar=gpu
[scamp]$ nvfortran crashes.F90 -c -stdpar=gpu
NVFORTRAN-S-1074-Procedure call in Do Concurrent is not supported yet (crashes.F90: 63)
  0 inform,   0 warnings,   1 severes, 0 fatal for heating
[scamp]$

Note that in this case, adding “-acc=gpu” doesn’t appear to resolve the issue, so that highlights we may need a better example for that particular case. If you can carve it out, we can analyze it and try to help.