Hi,
I am trying to use the PGI model for my 3-D Jacobi application which has 3 nested loops inside. These loops are rectangle and have no loop dependency. I did use the switch -Msafeptr for processing pointers. However, it seems that the pgcc compiler just parallelized the out-most loop.
#pragma acc data region copy(U[0:N+1][0:N+1][0:N+1]) copyin(Un[0:N+1][0:N+1][0:N+1]) copyin(b[0:N-1][0:N-1][0:N-1]) local(tmp[0:N+1][0:N+1][0:N+1])
{
for (int it= 1; it<=nIters; it++) {
#pragma acc region
{
for (k=1; k<N+1; k++)
for (j=1; j<N+1; j++)
for (i=1; i<N+1; i++)
Un_[j][k] = c * (U[i-1][j][k] + U[i+1][j][k] + U[j-1][k] + U[j+1][k] + U[j][k-1] + U[j][k+1] - c2*b[i-1][j-1][k-1]);
}
tmp = U;
U = Un;
Un = tmp;
}
}
Here is the message from the compiler:
146, Generating local(tmp[:N+1][:N+1][:N+1])
Generating copyin(b[:N-1][:N-1][:N-1])
Generating copyin(Un[:N+1][:N+1][:N+1])
Generating copy(U[:N+1][:N+1][:N+1])
155, Loop is parallelizable
Accelerator kernel generated
155, #pragma acc for parallel, vector(256)
156, Loop is parallelizable
157, Loop is parallelizable
Is this because of the current restriction of the PGI model?_