NVC++-S-0155-Compiler failed to translate accelerator region

jeeadv2021failure · March 26, 2024, 1:15pm

I am compiling a simple code, but I am getting error failed to translate accelerator region

nvc++ -o jsolvec.exe jsolvec.cpp -fast -Minfo=opt -acc -gpu=cc61, -Minfo=accel

when run this command , I get the following traceback

init_simple_diag_dom(int, double*):
     61, Zero trip check eliminated
NVC++-S-0155-Compiler failed to translate accelerator region (see -Minfo messages): Could not find allocated-variable index for symbol - A (jsolvec.cpp: 129)
main:
    129, Accelerator restriction: size of the GPU copy of A is unknown
         Generating implicit firstprivate(nsize)
         Generating NVIDIA GPU code
        129, #pragma acc loop gang /* blockIdx.x */
        132, #pragma acc loop vector(128) /* threadIdx.x */
             Generating reduction(+:rsum)
    132, Loop is parallelizable
    142, Generating implicit firstprivate(nsize)
         Generating NVIDIA GPU code
        142, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
    142, Generating implicit copy(residual) [if not already present]
         Generating implicit copyin(xold[:nsize],xnew[:nsize]) [if not already present]
NVC++-F-0704-Compilation aborted due to previous errors. (jsolvec.cpp)
NVC++/x86-64 Linux 24.3-0: compilation aborted

what is the error in this , can any one help

thanks

MatColgrove · March 26, 2024, 3:08pm

This is a generic error just meaning that the compiler was unable to create the device kernel. Here it can’t find a dynamically allocated variable though there’s not enough information for me to determine which variable.

Can you please post a reproducing example?

Thanks,
Mat

jeeadv2021failure · March 26, 2024, 4:38pm

  while ((residual > TOLERANCE) && (iters < max_iters)) {
    ++iters;
    // swap input and output vectors
    xtmp = xnew;
    xnew = xold;
    xold = xtmp;
  #pragma acc parallel loop 
    for (i = 0; i < nsize; ++i) {
      TYPE rsum = (TYPE)0;
    #pragma acc loop reduction(+:rsum)
      for (j = 0; j < nsize; ++j) {
        if (i != j) rsum += A[i*nsize + j] * xold[j];
      }
      xnew[i] = (b[i] - rsum) / A[i*nsize + i];
    }
    //
    // test convergence, sqrt(sum((xnew-xold)**2))
    //
    residual = 0.0;
   #pragma acc parallel loop reduction(+:residual)
    for (i = 0; i < nsize; i++) {
      TYPE dif;
      dif = xnew[i] - xold[i];
      residual += dif * dif;
    }
    residual = sqrt((double)residual);
    if (iters % riter == 0 ) cout << "Iteration " << iters << ", residual is " << residual << "\n";
  }
  elapsed_time = omp_get_wtime() - start_time;
  cout << "\nConverged after " << iters << " iterations and " << elapsed_time << " seconds, residual is " << residual << "\n";

This is the code block , in which I have made changes by adding pragma acc statements
and

#pragma acc parallel loop 
    for (i = 0; i < nsize; ++i)

this is the 128, 129 lines of code for which the error is being shown

can you please help me with this information

MatColgrove · March 26, 2024, 5:46pm

Without a reproducing example, I can’t be sure what’s happening, but we can try some things.

129, Accelerator restriction: size of the GPU copy of A is unknown

Here, the compiler can’t implicitly copy “A” since it can’t derive the size of the array given it uses a computed index rather than the loop iteration variables.

To fix, add a “copyin(A[:size])” to the “parallel loop”, replacing “size” with the actual number of elements. If “A” is copied to the device via an outer data region, you can use “present(A)” instead.

Generating implicit copyin(xold[:nsize],xnew[:nsize]) [if not already present]

Here, the compiler is implicitly copying xold and xnew, but you may want to explicitly copy them or put them in a “present” clause if they already in a data region.

This is optional since a “copy” first tests if the variable is already on the device, but I prefer making it explicit by adding the “present” or “default(present)” clause.

If you’re not doing it already, I highly suggest using a data region outside of the while loop, else your code will be copying data to/from the device with each iteration.

The pointer swapping is fine since the “present” test is associated with the host address, not the variable name.

Topic		Replies	Views
Compiler error Legacy PGI Compilers	2	2120	April 2, 2018
Compiler failed to translate accelerator region Legacy PGI Compilers	1	2659	October 25, 2018
PGF90-F-0155-Compiler failed to translate accelerator region Legacy PGI Compilers	6	9263	December 6, 2013
one error when compile a openacc program Legacy PGI Compilers	3	2701	April 27, 2012
NVC++-W-0155-Invalid accelerator data region: branching into or out of region is not allowed nvc, nvc++ and nvfortran	16	630	August 24, 2023
Getting rid of " size of the GPU copy of an array depe Legacy PGI Compilers	20	12334	December 23, 2011
Error with nollvm: Unsupported array datatype Legacy PGI Compilers	2	1589	July 17, 2018
Accelerator region ignored; no parallel kernels found Legacy PGI Compilers	9	6266	January 13, 2012
NVFORTRAN Compiler Error (HPC SDK 20.9) nvc, nvc++ and nvfortran	8	2387	November 3, 2022
PGF90-S-0155-Compiler failed to translate accelerator region Legacy PGI Compilers	3	4483	July 14, 2014

NVC++-S-0155-Compiler failed to translate accelerator region

Related topics