I am compiling a simple code, but I am getting error failed to translate accelerator region
nvc++ -o jsolvec.exe jsolvec.cpp -fast -Minfo=opt -acc -gpu=cc61, -Minfo=accel
when run this command , I get the following traceback
init_simple_diag_dom(int, double*):
61, Zero trip check eliminated
NVC++-S-0155-Compiler failed to translate accelerator region (see -Minfo messages): Could not find allocated-variable index for symbol - A (jsolvec.cpp: 129)
main:
129, Accelerator restriction: size of the GPU copy of A is unknown
Generating implicit firstprivate(nsize)
Generating NVIDIA GPU code
129, #pragma acc loop gang /* blockIdx.x */
132, #pragma acc loop vector(128) /* threadIdx.x */
Generating reduction(+:rsum)
132, Loop is parallelizable
142, Generating implicit firstprivate(nsize)
Generating NVIDIA GPU code
142, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
142, Generating implicit copy(residual) [if not already present]
Generating implicit copyin(xold[:nsize],xnew[:nsize]) [if not already present]
NVC++-F-0704-Compilation aborted due to previous errors. (jsolvec.cpp)
NVC++/x86-64 Linux 24.3-0: compilation aborted
what is the error in this , can any one help
thanks
This is a generic error just meaning that the compiler was unable to create the device kernel. Here it can’t find a dynamically allocated variable though there’s not enough information for me to determine which variable.
Can you please post a reproducing example?
Thanks,
Mat
while ((residual > TOLERANCE) && (iters < max_iters)) {
++iters;
// swap input and output vectors
xtmp = xnew;
xnew = xold;
xold = xtmp;
#pragma acc parallel loop
for (i = 0; i < nsize; ++i) {
TYPE rsum = (TYPE)0;
#pragma acc loop reduction(+:rsum)
for (j = 0; j < nsize; ++j) {
if (i != j) rsum += A[i*nsize + j] * xold[j];
}
xnew[i] = (b[i] - rsum) / A[i*nsize + i];
}
//
// test convergence, sqrt(sum((xnew-xold)**2))
//
residual = 0.0;
#pragma acc parallel loop reduction(+:residual)
for (i = 0; i < nsize; i++) {
TYPE dif;
dif = xnew[i] - xold[i];
residual += dif * dif;
}
residual = sqrt((double)residual);
if (iters % riter == 0 ) cout << "Iteration " << iters << ", residual is " << residual << "\n";
}
elapsed_time = omp_get_wtime() - start_time;
cout << "\nConverged after " << iters << " iterations and " << elapsed_time << " seconds, residual is " << residual << "\n";
This is the code block , in which I have made changes by adding pragma acc statements
and
#pragma acc parallel loop
for (i = 0; i < nsize; ++i)
this is the 128, 129 lines of code for which the error is being shown
can you please help me with this information
Without a reproducing example, I can’t be sure what’s happening, but we can try some things.
129, Accelerator restriction: size of the GPU copy of A is unknown
Here, the compiler can’t implicitly copy “A” since it can’t derive the size of the array given it uses a computed index rather than the loop iteration variables.
To fix, add a “copyin(A[:size])” to the “parallel loop”, replacing “size” with the actual number of elements. If “A” is copied to the device via an outer data region, you can use “present(A)” instead.
Generating implicit copyin(xold[:nsize],xnew[:nsize]) [if not already present]
Here, the compiler is implicitly copying xold and xnew, but you may want to explicitly copy them or put them in a “present” clause if they already in a data region.
This is optional since a “copy” first tests if the variable is already on the device, but I prefer making it explicit by adding the “present” or “default(present)” clause.
If you’re not doing it already, I highly suggest using a data region outside of the while loop, else your code will be copying data to/from the device with each iteration.
The pointer swapping is fine since the “present” test is associated with the host address, not the variable name.