Hi, a simple question,
please kindly the attached code:
for(i=0;i<n;i++)
{
for(j=0;j<n;j++)
{
if(A<B[j])
A=pow(B[j],2);
}
}
A and B are the same size of array, because the inner loop just change the value of A, and the value of B [j] is unchanged, so I want to use OPENACC to parallel outer loop, how to do?
Hi,
You could add a directive like the following before the outer loop:
#pragma acc region
However, the compiler notes that there is a scalar dependency on the assignment of A inside the inner loop body, which is carried up to the outer loop as well:
main:
13, Generating present_or_copyin(B[:])
Generating NVIDIA code
14, Loop carried scalar dependence for ‘A’ at line 18
Accelerator scalar kernel generated
16, Loop carried scalar dependence for ‘A’ at line 18
Generated 1 prefetches in scalar loop
I’m not sure you could deterministically compute a value for A in a parallel computation due to this scalar dependency.
Hope this helps,
+chris
Hi Sisy,
Did you really mean for “A” to be an array? If so, then to just accelerate the outer loop, you can do something like:
#pragma acc kernels loop gang vector independent
for(i=0;i<n;i++)
{
#prama acc loop seq
for(j=0;j<n;j++)
{
if(A[i]<B[j])
A[i]=pow(B[j],2);
}
}
“independent” may not be needed if you have specified A and B with the C99 “restrict” attribute.