The code below is written for Fortran 2003 and I am using the following compiler version.
pgfortran 15.5-0 64-bit target on Apple OS/X -tp haswell
The codebase is huge and so including only sections that I think are required below. I am getting the following error messages. Please help me understand the error messages.

809, Generating copyin(field1_proxy%data(1:undf),field2_proxy%data(1:undf))
Generating copyout(field_res_proxy%data(1:undf))
Accelerator kernel generated
810, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
809, Generating Tesla code
nvvmCompileProgram error: 9.
Error: Warning: Linking two modules of different target triples!
psy/psykal_lite.F90(809): Error: Formal parameter space overflowed (4096 bytes max) in function invoke_axpy_809_gpu

PGF90-F-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (psy/psykal_lite.F90: 1)
PGF90/x86-64 OSX 15.5-0: compilation aborted

!> invoke_axpy:  (a * x + y) ; a-scalar, x,y-vector
  subroutine invoke_axpy(scalar,field1,field2,field_res)
    use log_mod, only : log_event, LOG_LEVEL_ERROR
    implicit none
    type( field_type ), intent(in )    :: field1,field2
    type( field_type ), intent(inout ) :: field_res
    real(kind=r_def),   intent(in )    :: scalar
    type( field_proxy_type)            :: field1_proxy,field2_proxy      &
                                        , field_res_proxy
    integer                            :: i,undf

    field1_proxy = field1%get_proxy()
    field2_proxy = field2%get_proxy()
    field_res_proxy = field_res%get_proxy()

    !sanity check
    undf = field1_proxy%vspace%get_undf()
    if(undf /= field2_proxy%vspace%get_undf() ) then
      ! they are not on the same function space
      call log_event("Psy:axpy:field1 and field2 live on different w-spaces" &
                    , LOG_LEVEL_ERROR)
    if(undf /= field_res_proxy%vspace%get_undf() ) then
      ! they are not on the same function space
      call log_event("Psy:axpy:field1 and result_field live on different w-spaces" &
                    , LOG_LEVEL_ERROR)
!$acc parallel loop copyin(field1_proxy%data(1:undf),field2_proxy%data(1:undf)) copyout(field_res_proxy%data(1:undf))
    do i = 1,undf
      field_res_proxy%data(i) = (scalar * field1_proxy%data(i)) + field2_proxy%data(i)
    end do
!$acc end parallel loop
  end subroutine invoke_axpy

type, public :: field_type

    !> Each field has a pointer to the function space on which it lives
    type( function_space_type ), pointer         :: vspace => null( )
    !> Allocatable array of type real which holds the values of the field
    real(kind=r_def), allocatable         :: data( : )


    !> Function to get a proxy with public pointers to the data in a
    !! field_type.
    procedure, public :: get_proxy

    !> Sends the field contents to the log
    !! @param[in] title A title added to the log before the data is written out
    procedure, public :: log_field
    procedure, public :: log_dofs
    procedure, public :: log_minmax

    !> function returns the enumerated integer for the functions_space on which
    !! the field lives
    procedure         :: which_function_space

    !> Routine to read field
    procedure         :: read_field

    !> Routine to write field
    procedure         :: write_field

  end type field_type

Hi Karthee_s,

“Formal parameter space overflowed” is odd. CUDA has a limit on the size of the arguments being passed to a kernel but we take care of that by wrapping arguments in a struct so it’s unclear why this would be happening. Also, the arguments aren’t that big.

Can you please send a reproducing example to PGI Customer Service ( We’ll need to recreate the issue here in order to determine the issue.


Hi Mat,

I have emailed a part of the code that can be used to simulate this issue. I hope it helps identify the problem.


Karthee S

Hi Karthee,

The problem is that our LLVM code generator is having issues with an error in your code. Besides the data members, the type variables themselves need to be copied over to the device before they can be used. Once I make the following change, the file compiles successfully.

!$acc parallel loop copyin(field1_proxy,field_res_proxy) &
!$acc               copyin(field1_proxy%data(1:undf)) copyout(field_res_proxy%data(1:undf))
    do i = 1,undf
       field_res_proxy%data(i) = field1_proxy%data(i)
    end do
!$acc end parallel loop

I have added TPR#21741 since the compiler should be giving you a meaningful error message.