Reduction results in wrong results. Bug?

PaulPa · November 25, 2012, 12:40pm

Hi,

I’m currentliy working on this little piece of code:

#pragma acc parallel loop gang private(i,j) copy(force[0:DIM][0:N]) copyin(pos[0:DIM][0:N])
   for(i=0;i < num_particles ; ++i){
      double tmp_x = 0.0;
      double tmp_y = 0.0;
      double tmp_z = 0.0;
#pragma acc loop //reduction(+:tmp_x,tmp_z,tmp_z)
      for(j=0;j < num_particles; ++j){
         double dx = (pos[0][j] - pos[0][i]);
         double dy = (pos[1][j] - pos[1][i]);
         double dz = (pos[2][j] - pos[2][i]);
         double r = dx * dx + dy * dy + dz * dz;
         double tmp_f;
         if(r != 0.0){
            double s = 1.0 / r;
            s = s * s * s;
            tmp_f = 100.0 * s/r * (1.0 - 2.0 * s);
         }else{
            tmp_f = 0.0;
         }
         tmp_x += tmp_f * dx;
         tmp_y += tmp_f * dy;
         tmp_z += tmp_f * dz;
      }
      force[0][i] += tmp_x;
      force[1][i] += tmp_y;
      force[2][i] += tmp_z;
   }
}

If I compile this code as it is here, it is working perfectly (same results as sequential C code) and I receive the following compiler feedback:

    472, Accelerator kernel generated
        472, CC 2.0 : 47 registers; 16 shared, 96 constant, 0 local memory bytes
        476, #pragma acc loop gang /* blockIdx.x */
        481, #pragma acc loop vector(256) /* threadIdx.x */

However, I think that a reduction for tmp_x, tmp_y and tmp_z would be required. So if I uncomment the reduction I receive the exact same compiler feedback (i.e. nothing about added reductions) but the results are wrong.

What am I missing here?

Thanks.
Best,
Paul

MatColgrove · November 26, 2012, 7:27pm

Hi Paul,

In order the parallelize the inner loop, the compiler must be automatically generating the reductions, otherwise you’d be getting wrong answers without the reduction clause. I’m not sure why adding the reduction clause would then yield wrong answers. Seems like a compiler error.

Can you either post or send to PGI Customer Support (trs@pgroup.com) a reproducing example?

Thanks,
Mat

PaulPa · November 26, 2012, 7:52pm

thanks for the quick reply, I’ll file a bug-report shortly.

Do you know why I don’t see any compiler feedback which tells me that a reduction clause has been generated (in both cases)?

Best,
Paul

MatColgrove · November 26, 2012, 9:50pm

Hi Paul,

I just found the error:

#pragma acc loop reduction(+:tmp_x,tmp_z,tmp_z)

Changing one of the “tmp_z” to “tmp_y” fixes the problem.

I’ll need to talk with one of our compiler engineers about why we’re not printing a message for inner loop reductions.

Mat

PaulPa · November 27, 2012, 6:43am

ups :) thanks for catching that. Yes, a compiler feedback would be nice.

Maybe another question:
I’m curious if the private(i,j) clause is really necessary. In OpenMP it would be, at least private(j), is it true for OpenACC as well?
Because if I don’t use the private clause my programm runs 15% faster and still gives the same results.

Best,
Paul

MatColgrove · November 27, 2012, 3:37pm

I’m curious if the private(i,j) clause is really necessary

Scalars are privatized by default, so no, privatizing i and j is not necessary. As you found out, privatizing them can actually slow down your code. Privatizing a scalar variable will create an array of the variables, one for each thread, in global memory. If it’s not privatized, the variable is declared locally in the kernel and thus more likely to be stored in a register which is much faster to access.

Mat

PaulPa · November 27, 2012, 5:17pm

Is that true for the PGI compiler or for OpenACC in general?

MatColgrove · November 28, 2012, 12:21am

Is that true for the PGI compiler or for OpenACC in general?

I’m not sure. I believe it’s true for Cray, but don’t know for CAPS. The OpenACC 2.0 spec does clear this up a bit by adding a “default(none)” clause which will require the user to explicitly which variables are private. The exception being the loop index variables which are always private.

Mat

tull · January 24, 2014, 10:26pm

hello,
We have added compiler feedback when a reduction is performed.

thanks,
dave

Topic		Replies	Views
should use to "acc reduction" in an inner loop Legacy PGI Compilers	4	4186	December 6, 2012
#pragma acc kernels loop Versus #pragma acc parallel loop Legacy PGI Compilers	3	10689	June 1, 2015
OpenACC reductions Legacy PGI Compilers	1	2461	March 26, 2012
Parallel construct reductions Legacy PGI Compilers	3	4102	January 25, 2014
OpenACC diff between GPU + CPU codes Legacy PGI Compilers	5	4026	May 31, 2012
Problem with simple loop structure Legacy PGI Compilers	2	2201	March 8, 2018
Question about the reduction clause in OpenACC Legacy PGI Compilers	1	2011	July 29, 2013
private OpenACC clause on loop, kernels, and parallel constr Legacy PGI Compilers	6	10800	February 1, 2013
[Help] Using reduction with Array Legacy PGI Compilers	14	3194	March 21, 2024
OpenACC: Complete-specification of Data-clauses is giving wrong answer Legacy PGI Compilers	3	403	September 1, 2020

Reduction results in wrong results. Bug?

Related topics