Hi, a simple question,

please kindly the attached code:


A and B are the same size of array, because the inner loop just change the value of A, and the value of B [j] is unchanged, so I want to use OPENACC to parallel outer loop, how to do?


You could add a directive like the following before the outer loop:

#pragma acc region

However, the compiler notes that there is a scalar dependency on the assignment of A inside the inner loop body, which is carried up to the outer loop as well:

13, Generating present_or_copyin(B[:])
Generating NVIDIA code
14, Loop carried scalar dependence for ‘A’ at line 18
Accelerator scalar kernel generated
16, Loop carried scalar dependence for ‘A’ at line 18
Generated 1 prefetches in scalar loop

I’m not sure you could deterministically compute a value for A in a parallel computation due to this scalar dependency.

Hi Sisy,

Did you really mean for “A” to be an array? If so, then to just accelerate the outer loop, you can do something like:

#pragma acc kernels loop gang vector independent
#prama acc loop seq

“independent” may not be needed if you have specified A and B with the C99 “restrict” attribute.

