PGI accelerator model with nested loops

tannguyen · September 9, 2010, 6:41am

Hi,

I am trying to use the PGI model for my 3-D Jacobi application which has 3 nested loops inside. These loops are rectangle and have no loop dependency. I did use the switch -Msafeptr for processing pointers. However, it seems that the pgcc compiler just parallelized the out-most loop.

#pragma acc data region copy(U[0:N+1][0:N+1][0:N+1]) copyin(Un[0:N+1][0:N+1][0:N+1]) copyin(b[0:N-1][0:N-1][0:N-1]) local(tmp[0:N+1][0:N+1][0:N+1])
{
for (int it= 1; it<=nIters; it++) {
#pragma acc region
{
for (k=1; k<N+1; k++)
for (j=1; j<N+1; j++)
for (i=1; i<N+1; i++)
Un_[j][k] = c * (U[i-1][j][k] + U[i+1][j][k] + U[j-1][k] + U[j+1][k] + U[j][k-1] + U[j][k+1] - c2*b[i-1][j-1][k-1]);
}

tmp = U;
U = Un;
Un = tmp;
}
}

Here is the message from the compiler:

146, Generating local(tmp[:N+1][:N+1][:N+1])
Generating copyin(b[:N-1][:N-1][:N-1])
Generating copyin(Un[:N+1][:N+1][:N+1])
Generating copy(U[:N+1][:N+1][:N+1])
155, Loop is parallelizable
Accelerator kernel generated
155, #pragma acc for parallel, vector(256)
156, Loop is parallelizable
157, Loop is parallelizable

Is this because of the current restriction of the PGI model?_

tannguyen · September 9, 2010, 6:54am

I also used loop directives to instruct the compiler to map loop parallelism to GPU parallelism but it didn’t help:

#pragma acc region
{
#pragma acc for parallel vector(8) <== map to blocks
for (j=1; j<N+1; j++){
#pragma acc for seq unroll(4) <== sequential
for (k=1; k<N+1; k++)
{
#pragma acc for vector(8) <== map to threads
for (i=1; i<N+1; i++){
Un_[j][k] = c * (U[i-1][j][k] + U[i+1][j][k] + U[j-1][k] + U[j+1][k] + U[j][k-1] + U[j][k+1] - c2*b[j][k]);
}
}

}
}

The message from compiler is
157, Loop is parallelizable
Accelerator kernel generated
157, #pragma acc for parallel, vector(8)
159, Loop is parallelizable
162, Loop is parallelizable

Tan._

MatColgrove · September 9, 2010, 5:56pm

Hi Tan,

This is a known issue having to do with how the compiler was treating the outer loops index variable. The good news is that this issue will be fixed in this month’s 10.9 release.

Thanks,
Mat

tannguyen · September 9, 2010, 6:17pm

Thanks Mat, I can’t wait to see the new release :).

Tan.