Question about the reduction clause in OpenACC

Hi, Everyone,

I have a question when reading through a webpage about combing both OpenACC and OpenMP into one single program unit at the Dr.Dobb’s website ( The code snippet of concerned is excerpted to show in the text below. Can anyone let me know why the reduction clause (i.e., reduction(+:tmp)) of the OpenACC pragma is missing from line 16, while the same reduction clause (for the same loop as line 16) remains invoked by OpenMP in line 15?


1  void gramSchmidt(restrict float Q[][COLS], const int rows, const int cols) 
2  {
3  #pragma acc data copy(Q[0:rows][0:cols])
4   for(int k=0; k < cols; k++) {
5      double tmp = 0.;
6  #pragma omp parallel for reduction(+:tmp)
7  #pragma acc parallel reduction(+:tmp)
8      for(int i=0; i < rows; i++) tmp +=  (Q[i][k] * Q[i][k]);
9      tmp = sqrt(tmp);
11 #pragma omp parallel for
12 #pragma acc parallel loop
13    for(int i=0; i < rows; i++) Q[i][k] /= tmp;
15 #pragma omp parallel for reduction(+:tmp)
16 #pragma acc parallel loop
17     for(int j=k+1; j < cols; j++) {
18       tmp=0.;
19       for(int i=0; i < rows; i++) tmp += Q[i][k] * Q[i][j];
20       for(int i=0; i < rows; i++) Q[i][j] -= tmp * Q[i][k];
21     }
22   }
23 }

Hi Li,

To me, the question is not why it’s missing from OpenACC but why it’s included for OpenMP.

Only the outer loop is parallelized making the inner loops sequential. The OpenACC reduction clause is only needed when making parallel reductions since this requires extra code to set-up a partial reduction and then launch a second kernel to perform the final reduction.

I’ll send a note to Rob and ask if he’ll clarify his intent here.

Best Regards,