PGCC-S-0155-Compiler failed to translate accelerator region

Hi,

I am new using OpenAcc and I am trying to accelerate the following subroutine:

void Build_ERIS(vector<double> &eris, vector<Atomic_Orbital> &Basis)
{
  /*Computing the totalnumber of unique repulsion integrals*/
  int basis_size = Basis.size();
  int m = basis_size*(basis_size+1)/2;
  eris.resize(m*(m+1)/2);
  bool compute;
  std::fill(eris.begin(), eris.end(), 0);
  #pragma acc data copy(Basis) copy(eris)
  int i_orbital,j_orbital, k_orbital,l_orbital, i_primitive, j_primitive, k_primitive,l_primitive,ij,kl, ijkl,ijij,klkl;
 #pragma acc kernels
  { //This is the line 217 in the error message.
    for(i_orbital=0; i_orbital<basis_size; i_orbital++){
      for(j_orbital=0; j_orbital<i_orbital+1; j_orbital++){
        ij = i_orbital*(i_orbital+1)/2 + j_orbital;
        for(k_orbital=0; k_orbital<basis_size; k_orbital++){
          for(l_orbital=0; l_orbital<k_orbital+1; l_orbital++){
            kl = k_orbital*(k_orbital+1)/2 + l_orbital;
            if (ij >= kl) {


              ijkl = composite_index(i_orbital,j_orbital,k_orbital,l_orbital);

              ijij = composite_index(i_orbital,j_orbital,i_orbital,j_orbital);
              klkl = composite_index(k_orbital,l_orbital,k_orbital,l_orbital);
              //#pragma omp parallel for schedule(dynamic)                                                                                                                                                                                                                   
              for(i_primitive=0; i_primitive<Basis[i_orbital].contraction.size; i_primitive++)
                for(j_primitive=0; j_primitive<Basis[j_orbital].contraction.size; j_primitive++)
                  for(k_primitive=0; k_primitive<Basis[k_orbital].contraction.size; k_primitive++)
                    for(l_primitive=0; l_primitive<Basis[l_orbital].contraction.size; l_primitive++)
                      eris[ijkl] +=
                        normconst(Basis[i_orbital].contraction.exponent[i_primitive],Basis[i_orbital].angular.l, Basis[i_orbital].angular.m, Basis[i_orbital].angular.n)*
                        normconst(Basis[j_orbital].contraction.exponent[j_primitive],Basis[j_orbital].angular.l, Basis[j_orbital].angular.m, Basis[j_orbital].angular.n)*
                        normconst(Basis[k_orbital].contraction.exponent[k_primitive],Basis[k_orbital].angular.l, Basis[k_orbital].angular.m, Basis[k_orbital].angular.n)*
                        normconst(Basis[l_orbital].contraction.exponent[l_primitive],Basis[l_orbital].angular.l, Basis[l_orbital].angular.m, Basis[l_orbital].angular.n)*
                        Basis[i_orbital].contraction.coef[i_primitive]*
                        Basis[j_orbital].contraction.coef[j_primitive]*
                        Basis[k_orbital].contraction.coef[k_primitive]*
                        Basis[l_orbital].contraction.coef[l_primitive]*
                        ERI_int(Basis[i_orbital].contraction.center.x, Basis[i_orbital].contraction.center.y, Basis[i_orbital].contraction.center.z, Basis[i_orbital].contraction.exponent[i_primitive],Basis[i_orbital].angular.l, Basis[i_orbital].angular.m, Basis[i_orbi\
tal].angular.n,
                                Basis[j_orbital].contraction.center.x, Basis[j_orbital].contraction.center.y, Basis[j_orbital].contraction.center.z, Basis[j_orbital].contraction.exponent[j_primitive],Basis[j_orbital].angular.l, Basis[j_orbital].angular.m, Basis[j_orbi\
tal].angular.n,
                                Basis[k_orbital].contraction.center.x, Basis[k_orbital].contraction.center.y, Basis[k_orbital].contraction.center.z, Basis[k_orbital].contraction.exponent[k_primitive],Basis[k_orbital].angular.l, Basis[k_orbital].angular.m, Basis[k_orbi\
tal].angular.n,
                                Basis[l_orbital].contraction.center.x, Basis[l_orbital].contraction.center.y, Basis[l_orbital].contraction.center.z, Basis[l_orbital].contraction.exponent[l_primitive],Basis[l_orbital].angular.l, Basis[l_orbital].angular.m, Basis[l_orbi\
tal].angular.n);

              /**/
            }
          }
        }

      }
    }
  }
}

But when I compile teh code using the flags:

CFLAGS= -U__GNUG__   -fast -acc  -Minfo=accel -ta=tesla:manage   -larmadillo -lgsl -w -std=c++11

pgc++ fails accelerating the code and I get the error:

PGCC-S-0155-Compiler failed to translate accelerator region (see -Minfo messages): Could not find allocated-variable index for symbol (engine2.cpp: 217)
Build_ERIS(std::vector<double, std::allocator<double>> &, std::vector<Atomic_Orbital, std::allocator<Atomic_Orbital>>&):
    214, Generating copy(Basis[:],eris[:])
    218, Loop is parallelizable
    219, Loop is parallelizable
         Accelerator kernel generated
         Generating Tesla code
        218, #pragma acc loop gang /* blockIdx.y */
        219, #pragma acc loop gang /* blockIdx.x */
        222, #pragma acc loop vector(32) /* threadIdx.x */
    221, Loop is parallelizable
    222, Loop is parallelizable
    235, Inner sequential loop scheduled on accelerator
PGCC/x86 Linux 15.7-0: compilation completed with severe errors

How can I surpass this problem?
The line 217 was shown in the code above. What is going wrong with this line?

Hi rdguerrerom,

  1. I’m not sure that deep copy works here for Basis and eris vars
  2. what is composite_index(), ERI_int()? functions? mark them with acc routine pragma or inline them.

Alexey

Hi rdguerrerom,

Alexy is correct that deep copy isn’t supported in OpenACC as of yet. It’s being investigated for the 3.0 standard but for now you can’t just put aggregate types with dynamic data members in copy clauses. Instead you need to manually manage these structures yourself in order the copy them over to the accelerator.

STL::vectors are even more tricky since you don’t have direct access to the underlying data structure so can’t manage this data. However, it appears that you’re using CUDA Unified Memory (-ta=tesla:managed) so you shouldn’t need to worry about this.

FYI, my lecture from GTC2015 might be helpful here in understanding the issues: https://www.youtube.com/watch?v=rWLmZt_u5u4

Also for the reference of others, the following article for full details on using CUDA Unified Memory: http://www.pgroup.com/lit/articles/insider/v6n2a4.htm

While moot since you are using managed memory, you’re data region doesn’t have structured region (i.e. there’s no {} after it) so the region only spans the single line.

As for the error you’re seeing, I’m not entirely sure. It’s a compiler error and I see one similar issue report (TPR#19407) with the error. However, this issue was fixed in 15.5 so your error is something different.

Can you please either post or send to PGI Customer Service (trs@pgroup.com) a reproducing example?

  • Mat

These problem can be handled using deep copy like here?

http://www.openacc.org/sites/default/files/TR-14-1.pdf

Or like the “manual deep copy” example here?

http://docs.cray.com/cgi-bin/craydoc.cgi?mode=Show;q=;f=man/xt_prgdirm/82/cat7/openacc.examples.7.html

TR-14-1 is the first draft for an eventual solution for deep copy within the OpenACC standard. However it is likely to change before being adopted in the standard and has not been implemented.

Cray’s example #3 is more what you would need to do now. However besides the OpenACC API calls, it can be accomplished via pragma’s as well.

  • Mat