OpenACC Accelerator restriction: call to 'function' with no acc routine information

Hi everyone,

I’m dealing with the acceleration of a CFD code written in Fortran90 (MPI) and compiled with mpif90 under nvhpc-23.1. The GPU acceleration is performed with OpenACC and it is a multi GPU code. During the acceleration of a subroutine I have this error in compilation:

Accelerator restriction: call to ‘egspar’ with no acc routine information

The function egspar is called inside a triple DO loop in this way:

SUBROUTINE MolProp
USE global_mod
USE common_alloc
USE common_mpi
USE openacc

 !$acc update device(B,BBB,MINi,MAXi,MINj,MAXj,MINk,MAXk)
 !$acc update device(Yi(:,:,:,:),T(:,:,:),p(:,:,:),rho(:,:,:),Wmix(:,:,:))
 !$acc parallel loop gang vector collapse(3) &
 !$acc& private(s,ss,T_ijk,Wmix_ijk,Yi_ijk,Mis,Ks,Xir,appo,appo1,appo2,appo3,appo4,appo5,appo6) &
 !$acc& private(T_dim,c,DelTc,c_p1,d3,d4,d5,d2_KT,T_LJij,T_LJij2,T_LJij3,T_LJij4,T_LJij5,OMij,f) &
 !$acc& private(VISC_ijk,COND_ijk,DIFF_ijk) &
 !$acc& private(p_app,W_app,yyi_ijk,xxir,dtemp,CCp,st,st1,d_ij,stampa) &
 !$acc& copy(i,j,k) copyin(Yi(:,:,:,:),T(:,:,:),Wmix(:,:,:),p(:,:,:),rho(:,:,:)) copyout(VISC(:,:,:),COND(:,:,:),DIFF(:,:,:,:)) 
 do k = MINk(BBB)-GHOST ,MAXk(BBB)+GHOST
    do j = MINj(BBB)-GHOST ,MAXj(BBB)+GHOST
       do i= MINi(BBB)-GHOST ,MAXi(BBB)+GHOST

             !....Some calculations performed correctly.....

             call egspar(T_ijk,  xxir, yyi_ijk, CCp, weg, iweg, st, st1 )

      end do
    end do
  end do

where, for the sake of clarity yyi_ijk is a POINTER. Nevertheless, egspar is written in another file, which unfortunately is written in C language. The function is the following:

#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <openacc.h>

int egspar_c ( double T, double *x, double y, double *cp, double *weg, int *iweg ){

double summa, aaa, sss, wwtr, app, app1 ;
int ns, i, j ;

lspar_c(egegeg_.JFLAG, *(weg+egegeg_.IEGPA), T, *(weg+egegeg_.IDLT1), *(weg+egegeg_.IDLT2), *(weg+egegeg_.IDLT3), *(weg+egegeg_.IDLT4),
        *(weg+egegeg_.IDLT5), *(weg+egegeg_.IDLT6), *(weg+egegeg_.IEGRU), egegeg_.NS, (weg+egegeg_.IEGWT), (weg+egegeg_.IBIN),
        (weg+egegeg_.IETA), (weg+egegeg_.IETALG), (weg+egegeg_.IAIJ), (weg+egegeg_.IBIJ), (weg+egegeg_.ICIJ), (weg+egegeg_.ICTAIJ),
        (weg+egegeg_.IFITA), (weg+egegeg_.IFITB), (weg+egegeg_.IFITC), (weg+egegeg_.ICINT), cp, (weg+egegeg_.ICXI),
        (weg+egegeg_.IEPSIJ), (weg+egegeg_.IEGEPS), (weg+egegeg_.IEGCFD), (weg+egegeg_.IEGCFE), (weg+egegeg_.IEGZRT),
        (weg+egegeg_.IEGDIP), (iweg+egegeg_.IEGLIN));

summa = 0.0e0;
ns = egegeg_.NS;

for (int i = 0; i<ns; i++)
  summa = summa + *(x+i);

  *(weg+egegeg_.ISUMTR) = summa;
  aaa = summa / (double) ns;
  sss = 1.0e-16;
  wwtr = 0.0e0;
  for (int i = 0; i<ns; i++) {
    *(weg + egegeg_.IXTR + i ) = *(x + i) + sss *( aaa - *(x + i) );
    wwtr = wwtr + *(weg + egegeg_.IXTR + i ) * *(weg + egegeg_.IEGWT + i );
  }
  *(weg + egegeg_.IWWTR) = wwtr;
  for (int i = 0; i<ns; i++)
    *(weg + egegeg_.IYTR + i ) = *(weg + egegeg_.IXTR + i ) * *(weg + egegeg_.IEGWT + i ) / wwtr;

  egzero_c( ns, weg + egegeg_.IAUX);

  for (int i = 0; i<ns; i++){
    app1 = *(weg + egegeg_.IYTR + i ); app = 0.0e0;
    for (int j = i+1; j<ns; j++){
      app =  app + *(weg + egegeg_.IYTR + j);
      *(weg + egegeg_.IAUX + j) = *(weg + egegeg_.IAUX + j) + app1;
    }
    *(weg + egegeg_.IAUX + i ) = *(weg + egegeg_.IAUX + i) + app;
  }
}

This C code is compiled in the following way:

nvc -c -acc=gpu -target=gpu,noautopar -gpu=cc80 -Mpreprocess -Mfree -Mextend -Munixlogical -Mbyteswapio -traceback -Mchkstk -Mnostack_arrays -Mnofprelaxed -Mnofpapprox -Minfo=accel …/SHARED_FILES/EGSlib_c.c

I don’t understand this type of problem, since, I always do the same with other call to functions or subroutines written in Fortran, but I’ve never had an issue of this type. Could it be related to the fact that is written in C? I have to admit that I have no experience in C programming therefore I have some problems dealing with it.

Thank you in advance,
-Matteo

This means that the OpenACC compute region is calling a routine that doesn’t have a device version available.

For this, you need to add the “routine” directive in two spots. #1, As part of the routines definition which tells the compiler to create a device version of this routine. In this case would be:

#pragma acc routine seq
int egspar_c ( double T, double *x, double y, double *cp, double *weg, int *iweg ){

double summa, aaa, sss, wwtr, app, app1 ;

The second spot is to tell the caller that the device routine is available. In some cases, these can be combined such as in a module or a C header file since the directive is visible to both the prototype and definition of the routine, but here since they’re two different source files and languages, you do need both.

I presume you have an Fortran interface for “egspar”? If so, the you can add “!$acc routine” to the body of the interface.

If not and you’re instead using F77 style calling conventions, you’d add the following to “MolProp”:

SUBROUTINE MolProp
USE global_mod
USE common_alloc
USE common_mpi
USE openacc
implicit none
!$acc routine(egspar)
1 Like

Hi @MatColgrove ,

thank you for your suggestions. I really appreciate it. I implemented all you said but I think I’m missing something. In particular I add the #pragma acc routine seq to the egspar_c routine and also !$acc routine inside the interface in this way:

interface
    subroutine egspar(t, xi, yi, cp, w, iw, st, st1)
    !$acc routine 
      double precision          :: t, st(3), st1(3)
      double precision :: xi(NsMAX), yi(NsMAX), cp(NsMAX), w(NsMAX)
      integer, pointer          :: iw(:)
    end subroutine
  end interface

Now the previous issue is solved but when I compile I have this compilation error:

nvlink error   : Undefined reference to 'egspar_' in 'mol_prop.o'

I also tried different strategies like using !$acc routine(egspar) after the implicit none in the f90 code but I obtain the same nvlink error. Nevertheless, I can say that now the C code is compiled because I obtain “-Minfo=accel” information during the compilation phase, even if, I notice something strange also here. In particular, I see that one of the C routine gives me this warning:

lspar_c:
    102, Accelerator restriction: unsupported statement type: opcode=DEALLOC

nvvmCompileProgram error 9: NVVM_ERROR_COMPILATION.
Error: /tmp/nvaccFXwLbDLp_yW_r.gpu (684, 14): parse use of undefined value '@lspar_c'
NVC++-W-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (../SHARED_FILES/EGSlib_c.c: 1)
NVC++/x86-64 Linux 23.1-0: compilation completed with warnings

This lspar is a void routine called by egspar. The line 102 that gives me the warning is a return. This routine is the following:

void lspar_c ( int iflag, double patmos, double T, double dlt1, double dlt2, double dlt3, double dlt4, double dlt5, double dlt6,
               double ru, int ns, double *wt, double *bin, double *eta, double *etalg, double *aij, double *bij, double *cij, double *ctaij,
               double *fita, double *fitb, double *fitc, double *cint, double *cpms, double *cxi, double *epsij, double *eps,
               double *cofd, double *cofe, double *zrot, double *dip, int *lin ) {

double p, crot;
double era_c[ns], etalg_c[ns], dr, sq, dr32, dd, dd32 ;
double aaaa, bbbb, aaa;
int rrr, ii, ibin;

const double PI     = 3.1415926535e0; const double PI32O2 = 2.7842e+00;
const double P2O4P2 = 4.4674e+00;  const double PI32   = 5.5683e+00;

dlt1 = log ( T );    dlt2 = dlt1 * dlt1;  dlt3 = dlt2 * dlt1;
dlt4 = dlt3 * dlt1;  dlt5 = dlt4 * dlt1;  dlt6 = dlt5 * dlt1 ;
for (int i = 0; i<ns; i++){
  era_c[i] = etalg_c[i] = 0.0e0;
}

rrr = egscofe_c ( cofe, &ns, &dlt1, &dlt2, &dlt3, eta, etalg );
if (iflag <= 1) return;   <=== LINE 102

egscofd_c ( cofd , ns, dlt1, dlt2, dlt3, bin );

if (iflag <= 2) return;

//       DETERMINE A*, B*, AND C* FOR EACH SPECIES PAIR
if ( iflag == 3 || iflag == 5 ){
  for (int j = 0; j<ns; j++)
    for (int i = j; i<ns; i++){
      *(aij+j*ns+i) = *(fita+(ns*j+i)*7+0 ) + *(fita+(ns*j+i)*7+1) * dlt1 + *(fita+(ns*j+i)*7+2) * dlt2 +
                                              *(fita+(ns*j+i)*7+3) * dlt3 + *(fita+(ns*j+i)*7+4) * dlt4 +
                                              *(fita+(ns*j+i)*7+5) * dlt5 + *(fita+(ns*j+i)*7+6) * dlt6;
      if(i<j)
        *(aij+i*ns+j) = *(aij+j*ns+i);
    }
}
else if (iflag == 7)
  rrr = abc_c ( fita, fitb, fitc, &ns, &dlt1, &dlt2, &dlt3, &dlt4, &dlt5, &dlt6, aij, bij, cij );

  if (iflag == 3) return;

for (int k = 0; k<ns; k++) {
  if(*(lin+k) == 0)     { crot = 0.0e0; *(cint+k) = 0.0e0; }
  else if(*(lin+k) == 1){ crot = 1.0e0; *(cint+k) = *(cpms+k) * (*(wt+k)/ru) - 2.5e0; }
  else if(*(lin+k) == 2){ crot = 1.5e0; *(cint+k) = *(cpms+k) * (*(wt+k)/ru) - 2.5e0; }
  dr   = *(eps+k) / 298.0e0;
  sq   = sqrt(dr);
  dr32 = sq * dr;
  aaaa = ( 1.0e0 + PI32O2 * sq + P2O4P2 * dr + PI32 * dr32);
  dd   = *(eps+k) / T;
  sq   = sqrt(dd);
  dd32 = sq * dd;
  bbbb = ( 1.0e0 + PI32O2 * sq + P2O4P2 * dd + PI32 * dd32);
  *(cxi+k) =  crot  / ((aaaa/bbbb) * max( 1.0e0, (*(zrot+k)) ) * PI );
}

if (iflag <= 5) return;

for (int i = 0; i<ns; i++) {
    ii = i * ns + i;
    if(iflag == 7)
      aaa = *(aij+ii);
    else
      aaa = *(ctaij+ii);
    ibin = i * ns - (i * (i+1)) / 2 + i;
    *(bin+ibin) = 5.0e0 * patmos * (*(wt+i)) / ( 6.0e0 * ru * T * aaa * (*(eta+i)));
    if( *(dip+i) != 0.0e0 )
      *(bin+ibin) = *(bin+ibin) * ( 1.0e0 + 2.985e3 / pow(T,1.5e0));
  }
}

Thank you again for your immense support and above all patience.

-Matteo

I think for this one you just need to add the ISO_C_BINDING to the C routine so the names match. Something like:

...
use iso_c_binding
....
interface
    subroutine egspar(t, xi, yi, cp, w, iw, st, st1) bind(C, name="egspar_c")
    !$acc routine 
      double precision          :: t, st(3), st1(3)
      double precision :: xi(NsMAX), yi(NsMAX), cp(NsMAX), w(NsMAX)
      integer, pointer          :: iw(:)
    end subroutine
  end interface

Error: /tmp/nvaccFXwLbDLp_yW_r.gpu (684, 14): parse use of undefined value '@lspar_c'

Try adding “#pragma acc routine” to lspar_c as well as any other routines called from device code.

Keep in mind that the “routine” directive must be visible to both the caller and the callee, so it’s best to put it in header file next to the routine’s prototype.

1 Like

Hi Mat,

I implemented the correction you suggested in this way:

...
use, intrinsic :: iso_c_binding
...

IMPLICIT NONE
  
  interface
    subroutine egspar(t, xi, yi, cp, w, iw, st, st1) bind(C, name="egspar_c")
    !$acc routine 
      double precision,      :: t, st(3), st1(3)
      double precision       :: xi(NsMAX), yi(NsMAX), cp(NsMAX), w(NsMAX)
      integer, pointer         :: iw(:)
    end subroutine
  end interface
 
...
Some definitions and calculations
...

call egspar(T_ijk,  xxir, yyi_ijk, CCp, weg, iweg, st, st1)


and I obtain the same error with a different routine name:

nvlink error : Undefined reference to ‘egspar_c’ in ‘mol_prop.o’

It is strange. It’s like the C code is not linked well. For the sake of clarity I say you that I compile the C code in the following manner:

nvc -c -acc=gpu -target=gpu -gpu=cc80 -Minfo=accel EGSlib_c.c

where EGSlib_c.c is the file containing egspar. The final code is compiled in this way:

mpif90 -acc=gpu -target=gpu -gpu=cc80 -o CFD_CODE …Other objets… mol_prop.o EGSlib_c.o -L…/LAPACK -llapack_LINUX -lblas_LINUX



Regarding the other issue (lspar_c), I’ve added the #pragma acc routine to lspar_c but I always obtain the same:

lspar_c:
    104, Accelerator restriction: unsupported statement type: opcode=DEALLOC

nvvmCompileProgram error 9: NVVM_ERROR_COMPILATION.
Error: /tmp/nvaccI4S6cM3hv9M34.gpu (684, 14): parse use of undefined value '@lspar_c'
NVC++-W-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (../SHARED_FILES/EGSlib_c.c: 1)
NVC++/x86-64 Linux 23.1-0: compilation completed with warnings

I cannot understand what are the differences between this subroutine and the others. It looks fine to me but for sure I’m making some mistakes.

Thank you!!!

-Matteo

Can you send me your full source? You can direct message me if you can’t post it publicly, or I can send you my email address.

I think this will easier that going back and forth. I’d want to double check that you have the directives in the correct spot. Also lspar_c might not be generating a device routine given the error which I want to look into.

Thank you Mat! Is it possible to share you the code with all fortran objects (.o) and the parts of interest with the full source code? This is not my fault but just institute policy. Let me know and I can give you my email address in private.

Thank you again! I definitely need a “Mat” in my workplace =)

-Matteo

We’ll make it work. I’ll send you an email and we can arrange a way to for you to send me the package.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Per a request, I’m adding a summary of the offline discussion.

The “DEALLOC” error is due to the use of VLA’s in the C device code. This was resolved by updating the compiler version to 24.1 or later. Note that while C VLAs are supported in device code, they are not recommended. VLAs are implicitly allocated each time the code enters the device subroutine. Device side mallocs are very expensive for performance and the default device heap size is quite small often leading to heap overflows if the user does not increase the heap size via the environment variable NV_ACC_CUDA_HEAPSIZE or by calling cudaSetLimits.

A secondary issue was while Matteo had a “routine” directive in the Fortran interface, he was missing it in the C code, so was seeing linker error give the C device routine wasn’t getting created.