How to parallel outer loop

Hi, a simple question,

please kindly the attached code:

for(i=0;i<n;i++)
{
for(j=0;j<n;j++)
{
if(A<B[j])
A=pow(B[j],2);
}
}

A and B are the same size of array, because the inner loop just change the value of A, and the value of B [j] is unchanged, so I want to use OPENACC to parallel outer loop, how to do?

Hi,

You could add a directive like the following before the outer loop:

#pragma acc region

However, the compiler notes that there is a scalar dependency on the assignment of A inside the inner loop body, which is carried up to the outer loop as well:

main:
13, Generating present_or_copyin(B[:])
Generating NVIDIA code
14, Loop carried scalar dependence for ‘A’ at line 18
Accelerator scalar kernel generated
16, Loop carried scalar dependence for ‘A’ at line 18
Generated 1 prefetches in scalar loop

I’m not sure you could deterministically compute a value for A in a parallel computation due to this scalar dependency.

Hope this helps,

+chris

Hi Sisy,

Did you really mean for “A” to be an array? If so, then to just accelerate the outer loop, you can do something like:

#pragma acc kernels loop gang vector independent
for(i=0;i<n;i++) 
 { 
#prama acc loop seq
 for(j=0;j<n;j++) 
 { 
 if(A[i]<B[j]) 
 A[i]=pow(B[j],2); 
 } 
 }

“independent” may not be needed if you have specified A and B with the C99 “restrict” attribute.

  • Mat