OpenMP not parallelizing nested loop, depends on order

I’m having a strange issue with pgcc not parallelizing a nested loop, depending on the order of the loops.

For example, I have:

void function (vars) {
int row, col;
#pragma omp parallel for private(row, col)
148 for (row = 0; row < NUM; ++row) {
150     for (col = 0; col < NUM - 1; ++col) {

    ...
    }
}
206 }

Compiling with pgcc -fast -mp -Minfo gives me:
148, Parallel region activated
Parallel loop activated with static block schedule
206, Barrier
Parallel region terminated
150, Invariant if transformation
Loop not vectorized: may not be beneficial

If I switch the order of the for loops, then I get the same thing but without the “Loop not vectorized: may not be beneficial”.

Why would it be doing this? The order of the loops shouldn’t affect how it is parallelized, since they are independent. I would use the second way, but I have an operation that is in the outer loop only.

OpenACC doesn’t have a problem with this, so I’m not sure why OpenMP is having trouble. Any advice?

Hi Kyle,

Why would it be doing this? The order of the loops shouldn’t affect how it is parallelized, since they are independent.

It’s still parallelized, this message is about vectorization.

  • Mat