I’m having a strange issue with pgcc not parallelizing a nested loop, depending on the order of the loops.
For example, I have:
void function (vars) {
int row, col;
#pragma omp parallel for private(row, col)
148 for (row = 0; row < NUM; ++row) {
150 for (col = 0; col < NUM - 1; ++col) {
...
}
}
206 }
Compiling with pgcc -fast -mp -Minfo gives me:
148, Parallel region activated
Parallel loop activated with static block schedule
206, Barrier
Parallel region terminated
150, Invariant if transformation
Loop not vectorized: may not be beneficial
If I switch the order of the for loops, then I get the same thing but without the “Loop not vectorized: may not be beneficial”.
Why would it be doing this? The order of the loops shouldn’t affect how it is parallelized, since they are independent. I would use the second way, but I have an operation that is in the outer loop only.
OpenACC doesn’t have a problem with this, so I’m not sure why OpenMP is having trouble. Any advice?