OpenACC cache directive disables "implicit acc routine seq"

Using PGI pgc++ version 20.4 in a Linux environment. If I add a #pragma acc cache() directive inside an OpenACC kernel loop, the compiler complains with “PGC+±S-0155-Procedures called in a compute region must have acc routine information:”. Adding " #pragma acc routine seq" manually to every routine in all the include files will be impractical.

The code compiles and runs fine without the cache() directive. Is this a known limitation?

Hi Daniel,

No, this is not a known issue, nor was I able to replicate the behavior in some of my C++ codes. So unfortunately, I’m not sure what’s the cause in your case.

Would you be able to provide a reproducing example so I can investigate?

Thanks,
Mat

I’ll see if I can whittle down a reproducer.

I have a relatively small reproducer. How do I attach a file?

cat mainOpenACC_cache.cpp

#include <cmath>
#include <cstdio>
#include <exception>
#include <iostream>
#include <string>
#include <vector>
#include <Eigen/Core>
#include "openacc.h"

using std::size_t;

constexpr size_t ns  = 11;        // Number of species
constexpr double gas_constant = 8.3145; // J/mol-K

struct Result {
    double mean_enthalpy_rate;  // W/m3
    double elapsed_time;        // sec
};

//----------------------------------------------------------------------
// OpenACC Source Kernel
//----------------------------------------------------------------------
Result calc_source_OpenACC(size_t nx, size_t ny, size_t nz) {
    const auto nxyz = nx*ny*nz;
    double* __restrict__ enthalpy     = (double*) malloc(nxyz*sizeof(double));

    double sum_enthalpy = 0.0;
    static std::array<double,5> lbound_temps = {0.0, 200.0, 1000.0, 6000.0, 20000.0};
    #pragma acc data copyin(lbound_temps[0:5],gas_constant), copy(sum_enthalpy), copyout(enthalpy[0:nxyz])
    {

    size_t n = 0;

    #pragma acc kernels loop independent, present(lbound_temps,gas_constant,enthalpy) reduction(+:sum_enthalpy)
    for (n = 0; n < nxyz; ++n) {
        #pragma acc cache(lbound_temps[0:5]) // Why does this disable implicit acc routine seq?

        Eigen::Matrix<double, ns, 1> enthalpies;
        Eigen::Matrix<double, ns, 1> temp_deriv;
        Eigen::Matrix<double,  9, 1> tvec;

        const double t = 315.15;

        Eigen::Matrix<double, ns, 9> coeff;
        const double tinv = 1.0/(t + 1.0e-20);
        const double tlog = log(t + 1.0e-20);

        tvec[0] = -tinv;
        tvec[1] =  tlog;
        tvec[2] =  t;
        tvec[3] =  tvec[2]*t*(1.0/2.0);
        tvec[4] =  tvec[3]*t*(2.0/3.0);
        tvec[5] =  tvec[4]*t*(3.0/4.0);
        tvec[6] =  tvec[5]*t*(4.0/5.0);
        tvec[7] =  1.0;
        tvec[8] =  0.0;
        enthalpies = coeff*tvec;

        double dhdt = 0.0;
        enthalpy[n] = dhdt;
        sum_enthalpy += dhdt;
    };

    } // end data region

    free(enthalpy);
    return {sum_enthalpy/(nx*ny*nz), 1.0};
}

int main(int argc, char* argv[]) {

    // Process arguments
    size_t nx = (argc > 1)? std::stoi(argv[1]) : 64;
    size_t ny = (argc > 2)? std::stoi(argv[2]) : 64;
    size_t nz = (argc > 3)? std::stoi(argv[3]) : 64;

    auto [ha, ta] = calc_source_OpenACC(nx, ny, nz);
    printf("OpenACC Results:");
    printf("  Mean Enthalpy Rate:  %.2f W/cm3\n",  ha/1e6);
    printf("  Elapsed Time:        %.2f msec\n",   ta*1e3);
    printf("  Time Per Cell:       %.3f usec\n\n", ta*1e6/(nx*ny*nz));

    return 0;
}

git clone libeigen / eigen · GitLab
/pgi/20.4/linux86-64/20.4/bin/pgc++ -I./eigen -fast -O3 -DNDEBUG -acc -ta=tesla:cc70 -Minfo=accel -ta=tesla:lineinfo -DEIGEN_DONT_VECTORIZE=1 -fPIC --c++17 -A -o main.x mainOpenACC_cache.cpp

Thanks Daniel. I have reported the problem as TPR #28705. I also noticed a device code generation error when I removed the cache directive, which I reported separately as TPR #28706.

Note that we do have an open problems when compiling using Eigen. For example, without the “-DEIGEN_DONT_VECTORIZE=1”, the compile will give an internal compiler error. Unclear yet if all these issues are related, but possible.