OpenACC Accelerator restriction: call to 'function' with no acc routine information

matteo.cimini · February 29, 2024, 9:43am

Hi everyone,

I’m dealing with the acceleration of a CFD code written in Fortran90 (MPI) and compiled with mpif90 under nvhpc-23.1. The GPU acceleration is performed with OpenACC and it is a multi GPU code. During the acceleration of a subroutine I have this error in compilation:

Accelerator restriction: call to ‘egspar’ with no acc routine information

The function egspar is called inside a triple DO loop in this way:

SUBROUTINE MolProp
USE global_mod
USE common_alloc
USE common_mpi
USE openacc

 !$acc update device(B,BBB,MINi,MAXi,MINj,MAXj,MINk,MAXk)
 !$acc update device(Yi(:,:,:,:),T(:,:,:),p(:,:,:),rho(:,:,:),Wmix(:,:,:))
 !$acc parallel loop gang vector collapse(3) &
 !$acc& private(s,ss,T_ijk,Wmix_ijk,Yi_ijk,Mis,Ks,Xir,appo,appo1,appo2,appo3,appo4,appo5,appo6) &
 !$acc& private(T_dim,c,DelTc,c_p1,d3,d4,d5,d2_KT,T_LJij,T_LJij2,T_LJij3,T_LJij4,T_LJij5,OMij,f) &
 !$acc& private(VISC_ijk,COND_ijk,DIFF_ijk) &
 !$acc& private(p_app,W_app,yyi_ijk,xxir,dtemp,CCp,st,st1,d_ij,stampa) &
 !$acc& copy(i,j,k) copyin(Yi(:,:,:,:),T(:,:,:),Wmix(:,:,:),p(:,:,:),rho(:,:,:)) copyout(VISC(:,:,:),COND(:,:,:),DIFF(:,:,:,:)) 
 do k = MINk(BBB)-GHOST ,MAXk(BBB)+GHOST
    do j = MINj(BBB)-GHOST ,MAXj(BBB)+GHOST
       do i= MINi(BBB)-GHOST ,MAXi(BBB)+GHOST

             !....Some calculations performed correctly.....

             call egspar(T_ijk,  xxir, yyi_ijk, CCp, weg, iweg, st, st1 )

      end do
    end do
  end do

where, for the sake of clarity yyi_ijk is a POINTER. Nevertheless, egspar is written in another file, which unfortunately is written in C language. The function is the following:

#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <openacc.h>

int egspar_c ( double T, double *x, double y, double *cp, double *weg, int *iweg ){

double summa, aaa, sss, wwtr, app, app1 ;
int ns, i, j ;

lspar_c(egegeg_.JFLAG, *(weg+egegeg_.IEGPA), T, *(weg+egegeg_.IDLT1), *(weg+egegeg_.IDLT2), *(weg+egegeg_.IDLT3), *(weg+egegeg_.IDLT4),
        *(weg+egegeg_.IDLT5), *(weg+egegeg_.IDLT6), *(weg+egegeg_.IEGRU), egegeg_.NS, (weg+egegeg_.IEGWT), (weg+egegeg_.IBIN),
        (weg+egegeg_.IETA), (weg+egegeg_.IETALG), (weg+egegeg_.IAIJ), (weg+egegeg_.IBIJ), (weg+egegeg_.ICIJ), (weg+egegeg_.ICTAIJ),
        (weg+egegeg_.IFITA), (weg+egegeg_.IFITB), (weg+egegeg_.IFITC), (weg+egegeg_.ICINT), cp, (weg+egegeg_.ICXI),
        (weg+egegeg_.IEPSIJ), (weg+egegeg_.IEGEPS), (weg+egegeg_.IEGCFD), (weg+egegeg_.IEGCFE), (weg+egegeg_.IEGZRT),
        (weg+egegeg_.IEGDIP), (iweg+egegeg_.IEGLIN));

summa = 0.0e0;
ns = egegeg_.NS;

for (int i = 0; i<ns; i++)
  summa = summa + *(x+i);

  *(weg+egegeg_.ISUMTR) = summa;
  aaa = summa / (double) ns;
  sss = 1.0e-16;
  wwtr = 0.0e0;
  for (int i = 0; i<ns; i++) {
    *(weg + egegeg_.IXTR + i ) = *(x + i) + sss *( aaa - *(x + i) );
    wwtr = wwtr + *(weg + egegeg_.IXTR + i ) * *(weg + egegeg_.IEGWT + i );
  }
  *(weg + egegeg_.IWWTR) = wwtr;
  for (int i = 0; i<ns; i++)
    *(weg + egegeg_.IYTR + i ) = *(weg + egegeg_.IXTR + i ) * *(weg + egegeg_.IEGWT + i ) / wwtr;

  egzero_c( ns, weg + egegeg_.IAUX);

  for (int i = 0; i<ns; i++){
    app1 = *(weg + egegeg_.IYTR + i ); app = 0.0e0;
    for (int j = i+1; j<ns; j++){
      app =  app + *(weg + egegeg_.IYTR + j);
      *(weg + egegeg_.IAUX + j) = *(weg + egegeg_.IAUX + j) + app1;
    }
    *(weg + egegeg_.IAUX + i ) = *(weg + egegeg_.IAUX + i) + app;
  }
}

This C code is compiled in the following way:

nvc -c -acc=gpu -target=gpu,noautopar -gpu=cc80 -Mpreprocess -Mfree -Mextend -Munixlogical -Mbyteswapio -traceback -Mchkstk -Mnostack_arrays -Mnofprelaxed -Mnofpapprox -Minfo=accel …/SHARED_FILES/EGSlib_c.c

I don’t understand this type of problem, since, I always do the same with other call to functions or subroutines written in Fortran, but I’ve never had an issue of this type. Could it be related to the fact that is written in C? I have to admit that I have no experience in C programming therefore I have some problems dealing with it.

Thank you in advance,
-Matteo

MatColgrove · February 29, 2024, 5:02pm

This means that the OpenACC compute region is calling a routine that doesn’t have a device version available.

For this, you need to add the “routine” directive in two spots. #1, As part of the routines definition which tells the compiler to create a device version of this routine. In this case would be:

#pragma acc routine seq
int egspar_c ( double T, double *x, double y, double *cp, double *weg, int *iweg ){

double summa, aaa, sss, wwtr, app, app1 ;

The second spot is to tell the caller that the device routine is available. In some cases, these can be combined such as in a module or a C header file since the directive is visible to both the prototype and definition of the routine, but here since they’re two different source files and languages, you do need both.

I presume you have an Fortran interface for “egspar”? If so, the you can add “!$acc routine” to the body of the interface.

If not and you’re instead using F77 style calling conventions, you’d add the following to “MolProp”:

SUBROUTINE MolProp
USE global_mod
USE common_alloc
USE common_mpi
USE openacc
implicit none
!$acc routine(egspar)

matteo.cimini · March 5, 2024, 8:29am

Hi @MatColgrove ,

thank you for your suggestions. I really appreciate it. I implemented all you said but I think I’m missing something. In particular I add the #pragma acc routine seq to the egspar_c routine and also !$acc routine inside the interface in this way:

interface
    subroutine egspar(t, xi, yi, cp, w, iw, st, st1)
    !$acc routine 
      double precision          :: t, st(3), st1(3)
      double precision :: xi(NsMAX), yi(NsMAX), cp(NsMAX), w(NsMAX)
      integer, pointer          :: iw(:)
    end subroutine
  end interface

Now the previous issue is solved but when I compile I have this compilation error:

nvlink error   : Undefined reference to 'egspar_' in 'mol_prop.o'

I also tried different strategies like using !$acc routine(egspar) after the implicit none in the f90 code but I obtain the same nvlink error. Nevertheless, I can say that now the C code is compiled because I obtain “-Minfo=accel” information during the compilation phase, even if, I notice something strange also here. In particular, I see that one of the C routine gives me this warning:

lspar_c:
    102, Accelerator restriction: unsupported statement type: opcode=DEALLOC

nvvmCompileProgram error 9: NVVM_ERROR_COMPILATION.
Error: /tmp/nvaccFXwLbDLp_yW_r.gpu (684, 14): parse use of undefined value '@lspar_c'
NVC++-W-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (../SHARED_FILES/EGSlib_c.c: 1)
NVC++/x86-64 Linux 23.1-0: compilation completed with warnings

This lspar is a void routine called by egspar. The line 102 that gives me the warning is a return. This routine is the following:

void lspar_c ( int iflag, double patmos, double T, double dlt1, double dlt2, double dlt3, double dlt4, double dlt5, double dlt6,
               double ru, int ns, double *wt, double *bin, double *eta, double *etalg, double *aij, double *bij, double *cij, double *ctaij,
               double *fita, double *fitb, double *fitc, double *cint, double *cpms, double *cxi, double *epsij, double *eps,
               double *cofd, double *cofe, double *zrot, double *dip, int *lin ) {

double p, crot;
double era_c[ns], etalg_c[ns], dr, sq, dr32, dd, dd32 ;
double aaaa, bbbb, aaa;
int rrr, ii, ibin;

const double PI     = 3.1415926535e0; const double PI32O2 = 2.7842e+00;
const double P2O4P2 = 4.4674e+00;  const double PI32   = 5.5683e+00;

dlt1 = log ( T );    dlt2 = dlt1 * dlt1;  dlt3 = dlt2 * dlt1;
dlt4 = dlt3 * dlt1;  dlt5 = dlt4 * dlt1;  dlt6 = dlt5 * dlt1 ;
for (int i = 0; i<ns; i++){
  era_c[i] = etalg_c[i] = 0.0e0;
}

rrr = egscofe_c ( cofe, &ns, &dlt1, &dlt2, &dlt3, eta, etalg );
if (iflag <= 1) return;   <=== LINE 102

egscofd_c ( cofd , ns, dlt1, dlt2, dlt3, bin );

if (iflag <= 2) return;

//       DETERMINE A*, B*, AND C* FOR EACH SPECIES PAIR
if ( iflag == 3 || iflag == 5 ){
  for (int j = 0; j<ns; j++)
    for (int i = j; i<ns; i++){
      *(aij+j*ns+i) = *(fita+(ns*j+i)*7+0 ) + *(fita+(ns*j+i)*7+1) * dlt1 + *(fita+(ns*j+i)*7+2) * dlt2 +
                                              *(fita+(ns*j+i)*7+3) * dlt3 + *(fita+(ns*j+i)*7+4) * dlt4 +
                                              *(fita+(ns*j+i)*7+5) * dlt5 + *(fita+(ns*j+i)*7+6) * dlt6;
      if(i<j)
        *(aij+i*ns+j) = *(aij+j*ns+i);
    }
}
else if (iflag == 7)
  rrr = abc_c ( fita, fitb, fitc, &ns, &dlt1, &dlt2, &dlt3, &dlt4, &dlt5, &dlt6, aij, bij, cij );

  if (iflag == 3) return;

for (int k = 0; k<ns; k++) {
  if(*(lin+k) == 0)     { crot = 0.0e0; *(cint+k) = 0.0e0; }
  else if(*(lin+k) == 1){ crot = 1.0e0; *(cint+k) = *(cpms+k) * (*(wt+k)/ru) - 2.5e0; }
  else if(*(lin+k) == 2){ crot = 1.5e0; *(cint+k) = *(cpms+k) * (*(wt+k)/ru) - 2.5e0; }
  dr   = *(eps+k) / 298.0e0;
  sq   = sqrt(dr);
  dr32 = sq * dr;
  aaaa = ( 1.0e0 + PI32O2 * sq + P2O4P2 * dr + PI32 * dr32);
  dd   = *(eps+k) / T;
  sq   = sqrt(dd);
  dd32 = sq * dd;
  bbbb = ( 1.0e0 + PI32O2 * sq + P2O4P2 * dd + PI32 * dd32);
  *(cxi+k) =  crot  / ((aaaa/bbbb) * max( 1.0e0, (*(zrot+k)) ) * PI );
}

if (iflag <= 5) return;

for (int i = 0; i<ns; i++) {
    ii = i * ns + i;
    if(iflag == 7)
      aaa = *(aij+ii);
    else
      aaa = *(ctaij+ii);
    ibin = i * ns - (i * (i+1)) / 2 + i;
    *(bin+ibin) = 5.0e0 * patmos * (*(wt+i)) / ( 6.0e0 * ru * T * aaa * (*(eta+i)));
    if( *(dip+i) != 0.0e0 )
      *(bin+ibin) = *(bin+ibin) * ( 1.0e0 + 2.985e3 / pow(T,1.5e0));
  }
}

Thank you again for your immense support and above all patience.

-Matteo

MatColgrove · March 5, 2024, 4:50pm

I think for this one you just need to add the ISO_C_BINDING to the C routine so the names match. Something like:

...
use iso_c_binding
....
interface
    subroutine egspar(t, xi, yi, cp, w, iw, st, st1) bind(C, name="egspar_c")
    !$acc routine 
      double precision          :: t, st(3), st1(3)
      double precision :: xi(NsMAX), yi(NsMAX), cp(NsMAX), w(NsMAX)
      integer, pointer          :: iw(:)
    end subroutine
  end interface

Error: /tmp/nvaccFXwLbDLp_yW_r.gpu (684, 14): parse use of undefined value '@lspar_c'

Try adding “#pragma acc routine” to lspar_c as well as any other routines called from device code.

Keep in mind that the “routine” directive must be visible to both the caller and the callee, so it’s best to put it in header file next to the routine’s prototype.

matteo.cimini · March 6, 2024, 12:12pm

Hi Mat,

I implemented the correction you suggested in this way:

...
use, intrinsic :: iso_c_binding
...

IMPLICIT NONE
  
  interface
    subroutine egspar(t, xi, yi, cp, w, iw, st, st1) bind(C, name="egspar_c")
    !$acc routine 
      double precision,      :: t, st(3), st1(3)
      double precision       :: xi(NsMAX), yi(NsMAX), cp(NsMAX), w(NsMAX)
      integer, pointer         :: iw(:)
    end subroutine
  end interface
 
...
Some definitions and calculations
...

call egspar(T_ijk,  xxir, yyi_ijk, CCp, weg, iweg, st, st1)

and I obtain the same error with a different routine name:

nvlink error : Undefined reference to ‘egspar_c’ in ‘mol_prop.o’

It is strange. It’s like the C code is not linked well. For the sake of clarity I say you that I compile the C code in the following manner:

nvc -c -acc=gpu -target=gpu -gpu=cc80 -Minfo=accel EGSlib_c.c

where EGSlib_c.c is the file containing egspar. The final code is compiled in this way:

mpif90 -acc=gpu -target=gpu -gpu=cc80 -o CFD_CODE …Other objets… mol_prop.o EGSlib_c.o -L…/LAPACK -llapack_LINUX -lblas_LINUX

Regarding the other issue (lspar_c), I’ve added the #pragma acc routine to lspar_c but I always obtain the same:

lspar_c:
    104, Accelerator restriction: unsupported statement type: opcode=DEALLOC

nvvmCompileProgram error 9: NVVM_ERROR_COMPILATION.
Error: /tmp/nvaccI4S6cM3hv9M34.gpu (684, 14): parse use of undefined value '@lspar_c'
NVC++-W-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (../SHARED_FILES/EGSlib_c.c: 1)
NVC++/x86-64 Linux 23.1-0: compilation completed with warnings

I cannot understand what are the differences between this subroutine and the others. It looks fine to me but for sure I’m making some mistakes.

Thank you!!!

-Matteo

MatColgrove · March 6, 2024, 5:51pm

Can you send me your full source? You can direct message me if you can’t post it publicly, or I can send you my email address.

I think this will easier that going back and forth. I’d want to double check that you have the directives in the correct spot. Also lspar_c might not be generating a device routine given the error which I want to look into.

matteo.cimini · March 7, 2024, 8:38am

Thank you Mat! Is it possible to share you the code with all fortran objects (.o) and the parts of interest with the full source code? This is not my fault but just institute policy. Let me know and I can give you my email address in private.

Thank you again! I definitely need a “Mat” in my workplace =)

-Matteo

MatColgrove · March 7, 2024, 4:56pm

We’ll make it work. I’ll send you an email and we can arrange a way to for you to send me the package.

system · March 21, 2024, 4:57pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

MatColgrove · November 26, 2024, 4:26pm

Per a request, I’m adding a summary of the offline discussion.

The “DEALLOC” error is due to the use of VLA’s in the C device code. This was resolved by updating the compiler version to 24.1 or later. Note that while C VLAs are supported in device code, they are not recommended. VLAs are implicitly allocated each time the code enters the device subroutine. Device side mallocs are very expensive for performance and the default device heap size is quite small often leading to heap overflows if the user does not increase the heap size via the environment variable NV_ACC_CUDA_HEAPSIZE or by calling cudaSetLimits.

A secondary issue was while Matteo had a “routine” directive in the Fortran interface, he was missing it in the C code, so was seeing linker error give the C device routine wasn’t getting created.

Topic		Replies	Views
The Fortran OpenACC acceleration code compiles successfully but still runs on the CPU nvc, nvc++ and nvfortran	14	32	December 28, 2024
Implicit data copy to device for allocated arrays using compilation option -stdpar=gpu nvc, nvc++ and nvfortran	11	683	May 31, 2023
Openacc fortran acc routine error [nvlink error : undefined reference to 'subroutine_name_' in 'file_name'] Legacy PGI Compilers	5	1366	March 3, 2023
Questions on incorrect results with openacc in GPU nvc, nvc++ and nvfortran	33	2458	December 4, 2023
Handling global variables in OpenACC kernels nvc, nvc++ and nvfortran	14	1008	August 14, 2023
Problem with NVFORTRAN and R nvc, nvc++ and nvfortran	46	2756	April 25, 2024
OpenACC for code acceleration Legacy PGI Compilers	13	10656	November 6, 2017
OpenACC GPU + MPI - Acceleration preventend in triple loop C code nvc, nvc++ and nvfortran	3	589	October 10, 2023
acc routine and Fortran Legacy PGI Compilers	6	14076	March 13, 2014
Openacc routine directive nvc, nvc++ and nvfortran	3	543	March 27, 2024

OpenACC Accelerator restriction: call to 'function' with no acc routine information

Related topics