How to parallel the outer loop

#pragma acc data copyin(C, sum, X) copy(x)
{
#pragma acc kernels loop independent 
      for(i = 1; i < SIZE; i++) {
              sum[i] = 0;
              for(k = 0; k < 3; k++) {
                    sum[i] += C[i-1][0][k] * X[k];
              }
              x[i] = sum[i];
      }
 } 


    104, Loop is parallelizable
         Accelerator kernel generated
        104, #pragma acc loop gang, vector(32) /* blockIdx.x threadIdx.x */
             CC 1.0 : 10 registers; 48 shared, 12 constant, 0 local memory bytes
             CC 2.0 : 17 registers; 0 shared, 64 constant, 0 local memory bytes
    106, Complex loop carried dependence of 'sum' prevents parallelization
         Loop carried reuse of 'sum' prevents parallelization
         Inner sequential loop scheduled on accelerator

If I wanna only parallel the outer loop i and remove dependency, how to modify my code?
Thank you so much.

Hi vincent5552,

If I wanna only parallel the outer loop i and remove dependency, how to modify my code?

The loop dependency message is for the inner “k” loop due, and the outer “i” loop is being accelerated. This is what you want, correct?

  • Mat

Hi mkcolg,

Very thank you for helping me.

Yes, if I only accelerate “i” loop, I think there is no dependency in k loop because each “i” should be independent to each others, right?


I think there is no dependency in k loop

The same “sum_” is being accumulated for each iteration of “k”, so there is a dependency in this loop. You could change sum to be a scalar and then use the “reduction” clause to parallelize it. However, since the trip count is only 3, it’s better to execute it serially within the generated kernel._

I think there is no dependency in k loop because each “i” should be independent to each others, right? >

_No, just because “i” is parallelizable it does not mean the “k” loop is parallizable as well. Multiple “k” loops will be executed concurrently, but the “k” loop itself is executed serially.
\

  • Mat_