NVC++-S-0155-Compiler failed to translate accelerator region

I am compiling a simple code, but I am getting error failed to translate accelerator region

nvc++ -o jsolvec.exe jsolvec.cpp -fast -Minfo=opt -acc -gpu=cc61, -Minfo=accel

when run this command , I get the following traceback

init_simple_diag_dom(int, double*):
     61, Zero trip check eliminated
NVC++-S-0155-Compiler failed to translate accelerator region (see -Minfo messages): Could not find allocated-variable index for symbol - A (jsolvec.cpp: 129)
main:
    129, Accelerator restriction: size of the GPU copy of A is unknown
         Generating implicit firstprivate(nsize)
         Generating NVIDIA GPU code
        129, #pragma acc loop gang /* blockIdx.x */
        132, #pragma acc loop vector(128) /* threadIdx.x */
             Generating reduction(+:rsum)
    132, Loop is parallelizable
    142, Generating implicit firstprivate(nsize)
         Generating NVIDIA GPU code
        142, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
    142, Generating implicit copy(residual) [if not already present]
         Generating implicit copyin(xold[:nsize],xnew[:nsize]) [if not already present]
NVC++-F-0704-Compilation aborted due to previous errors. (jsolvec.cpp)
NVC++/x86-64 Linux 24.3-0: compilation aborted

what is the error in this , can any one help

thanks

This is a generic error just meaning that the compiler was unable to create the device kernel. Here it can’t find a dynamically allocated variable though there’s not enough information for me to determine which variable.

Can you please post a reproducing example?

Thanks,
Mat

  while ((residual > TOLERANCE) && (iters < max_iters)) {
    ++iters;
    // swap input and output vectors
    xtmp = xnew;
    xnew = xold;
    xold = xtmp;
  #pragma acc parallel loop 
    for (i = 0; i < nsize; ++i) {
      TYPE rsum = (TYPE)0;
    #pragma acc loop reduction(+:rsum)
      for (j = 0; j < nsize; ++j) {
        if (i != j) rsum += A[i*nsize + j] * xold[j];
      }
      xnew[i] = (b[i] - rsum) / A[i*nsize + i];
    }
    //
    // test convergence, sqrt(sum((xnew-xold)**2))
    //
    residual = 0.0;
   #pragma acc parallel loop reduction(+:residual)
    for (i = 0; i < nsize; i++) {
      TYPE dif;
      dif = xnew[i] - xold[i];
      residual += dif * dif;
    }
    residual = sqrt((double)residual);
    if (iters % riter == 0 ) cout << "Iteration " << iters << ", residual is " << residual << "\n";
  }
  elapsed_time = omp_get_wtime() - start_time;
  cout << "\nConverged after " << iters << " iterations and " << elapsed_time << " seconds, residual is " << residual << "\n";

This is the code block , in which I have made changes by adding pragma acc statements
and

#pragma acc parallel loop 
    for (i = 0; i < nsize; ++i)

this is the 128, 129 lines of code for which the error is being shown

can you please help me with this information

Without a reproducing example, I can’t be sure what’s happening, but we can try some things.

129, Accelerator restriction: size of the GPU copy of A is unknown

Here, the compiler can’t implicitly copy “A” since it can’t derive the size of the array given it uses a computed index rather than the loop iteration variables.

To fix, add a “copyin(A[:size])” to the “parallel loop”, replacing “size” with the actual number of elements. If “A” is copied to the device via an outer data region, you can use “present(A)” instead.

Generating implicit copyin(xold[:nsize],xnew[:nsize]) [if not already present]

Here, the compiler is implicitly copying xold and xnew, but you may want to explicitly copy them or put them in a “present” clause if they already in a data region.

This is optional since a “copy” first tests if the variable is already on the device, but I prefer making it explicit by adding the “present” or “default(present)” clause.

If you’re not doing it already, I highly suggest using a data region outside of the while loop, else your code will be copying data to/from the device with each iteration.

The pointer swapping is fine since the “present” test is associated with the host address, not the variable name.