Hello,

I have some problems using the OpenACC cache directive with the PGI compiler.

```
int iters = 0 ;
#pragma acc data copy(Uold), copyin(rhs), create(Unew)
{
while (iters < max_iters && l2_norm > 1e-9) {
++iters;
/* update each interior point */
#pragma acc kernels loop independent
for (k=1; k<= n; k++){
#pragma acc loop independent
for (j=1; j<= n; j++){
#pragma acc loop independent
for (i=1; i<= n; i++) {
#pragma acc cache(rhs[k-1:k+1][j-1:j+1][i-1:i+1])
//#pragma acc cache(Uold[k-1:k+1][j-1:j+1][i-1:i+1])
Unew[k][j][i] = factor*(rhs[k][j][i]
+factor2*(Uold[k][j][i-1]+Uold[k][j][i+1]
+Uold[k][j-1][i]+Uold[k][j+1][i]
+Uold[k+1][j][i]+Uold[k-1][j][i])
+Uold[k-1][j-1][i]+Uold[k-1][j+1][i]
+Uold[k-1][j][i-1]+Uold[k-1][j][i+1]
+Uold[k][j-1][i-1]+Uold[k][j+1][i-1]
+Uold[k][j-1][i+1]+Uold[k][j+1][i+1]
+Uold[k+1][j-1][i]+Uold[k+1][j+1][i]
+Uold[k+1][j][i-1]+Uold[k+1][j][i+1]);
}
}
}
/* pointer swap */
REAL*** tmp;
REAL*** p_old = Uold;
REAL*** p_new = Unew;
tmp = p_old; p_old= p_new; p_new = tmp;
nIters = iters;
}
}
```

If I uncomment the second #pragma acc cache, the compilation outputs

```
PGC-W-0155-Compiler failed to translate accelerator region (see -Minfo messages): Unexpected load/store type (openacc_poisson.c: 149)
main:
140, getTime inlined, size=6, file openacc_poisson.c (67)
144, Generating create(Unew[0:][0:][0:])
Generating copyin(rhs[0:][0:][0:])
Generating copy(Uold[0:][0:][0:])
150, Loop is parallelizable
152, Loop is parallelizable
154, Loop is parallelizable
Accelerator kernel generated
150, #pragma acc loop vector(4) /* threadIdx.y */
152, #pragma acc loop gang /* blockIdx.y */
Cached references to size [(y+2)x3x(x+2)] block of 'rhs'
Cached references to size [(y+2)x3x(x+2)] block of 'Uold'
154, #pragma acc loop gang, vector(64) /* blockIdx.x threadIdx.x */
178, getTime inlined, size=6, file openacc_poisson.c (67)
PGC/x86-64 Linux 13.1-1: compilation completed with warnings
```

It seems that PGI compiler has a problem with the *j+1* in my Uold array. If I remove the loads from Uold[j+1] the compilation works, but of course the program does not execute correctly.

What do you think I should try?

Thank you,

Valeriu