Using PGI pgc++ version 20.4 in a Linux environment. If I add a #pragma acc cache() directive inside an OpenACC kernel loop, the compiler complains with “PGC+±S-0155-Procedures called in a compute region must have acc routine information:”. Adding " #pragma acc routine seq" manually to every routine in all the include files will be impractical.
The code compiles and runs fine without the cache() directive. Is this a known limitation?
Hi Daniel,
No, this is not a known issue, nor was I able to replicate the behavior in some of my C++ codes. So unfortunately, I’m not sure what’s the cause in your case.
Would you be able to provide a reproducing example so I can investigate?
Thanks,
Mat
I’ll see if I can whittle down a reproducer.
I have a relatively small reproducer. How do I attach a file?
cat mainOpenACC_cache.cpp
#include <cmath>
#include <cstdio>
#include <exception>
#include <iostream>
#include <string>
#include <vector>
#include <Eigen/Core>
#include "openacc.h"
using std::size_t;
constexpr size_t ns = 11; // Number of species
constexpr double gas_constant = 8.3145; // J/mol-K
struct Result {
double mean_enthalpy_rate; // W/m3
double elapsed_time; // sec
};
//----------------------------------------------------------------------
// OpenACC Source Kernel
//----------------------------------------------------------------------
Result calc_source_OpenACC(size_t nx, size_t ny, size_t nz) {
const auto nxyz = nx*ny*nz;
double* __restrict__ enthalpy = (double*) malloc(nxyz*sizeof(double));
double sum_enthalpy = 0.0;
static std::array<double,5> lbound_temps = {0.0, 200.0, 1000.0, 6000.0, 20000.0};
#pragma acc data copyin(lbound_temps[0:5],gas_constant), copy(sum_enthalpy), copyout(enthalpy[0:nxyz])
{
size_t n = 0;
#pragma acc kernels loop independent, present(lbound_temps,gas_constant,enthalpy) reduction(+:sum_enthalpy)
for (n = 0; n < nxyz; ++n) {
#pragma acc cache(lbound_temps[0:5]) // Why does this disable implicit acc routine seq?
Eigen::Matrix<double, ns, 1> enthalpies;
Eigen::Matrix<double, ns, 1> temp_deriv;
Eigen::Matrix<double, 9, 1> tvec;
const double t = 315.15;
Eigen::Matrix<double, ns, 9> coeff;
const double tinv = 1.0/(t + 1.0e-20);
const double tlog = log(t + 1.0e-20);
tvec[0] = -tinv;
tvec[1] = tlog;
tvec[2] = t;
tvec[3] = tvec[2]*t*(1.0/2.0);
tvec[4] = tvec[3]*t*(2.0/3.0);
tvec[5] = tvec[4]*t*(3.0/4.0);
tvec[6] = tvec[5]*t*(4.0/5.0);
tvec[7] = 1.0;
tvec[8] = 0.0;
enthalpies = coeff*tvec;
double dhdt = 0.0;
enthalpy[n] = dhdt;
sum_enthalpy += dhdt;
};
} // end data region
free(enthalpy);
return {sum_enthalpy/(nx*ny*nz), 1.0};
}
int main(int argc, char* argv[]) {
// Process arguments
size_t nx = (argc > 1)? std::stoi(argv[1]) : 64;
size_t ny = (argc > 2)? std::stoi(argv[2]) : 64;
size_t nz = (argc > 3)? std::stoi(argv[3]) : 64;
auto [ha, ta] = calc_source_OpenACC(nx, ny, nz);
printf("OpenACC Results:");
printf(" Mean Enthalpy Rate: %.2f W/cm3\n", ha/1e6);
printf(" Elapsed Time: %.2f msec\n", ta*1e3);
printf(" Time Per Cell: %.3f usec\n\n", ta*1e6/(nx*ny*nz));
return 0;
}
git clone libeigen / eigen · GitLab
/pgi/20.4/linux86-64/20.4/bin/pgc++ -I./eigen -fast -O3 -DNDEBUG -acc -ta=tesla:cc70 -Minfo=accel -ta=tesla:lineinfo -DEIGEN_DONT_VECTORIZE=1 -fPIC --c++17 -A -o main.x mainOpenACC_cache.cpp
Thanks Daniel. I have reported the problem as TPR #28705. I also noticed a device code generation error when I removed the cache directive, which I reported separately as TPR #28706.
Note that we do have an open problems when compiling using Eigen. For example, without the “-DEIGEN_DONT_VECTORIZE=1”, the compile will give an internal compiler error. Unclear yet if all these issues are related, but possible.