problem with scalar variable live-out from loop

Hi,

the following code is not parallelized by the compiler:

#pragma acc kernels present(a,b)
#pragma acc loop independent
  for(int i=0;i<size;++i){
    double J1 = 0.;
    getWeight(a,J1);
    a[i] -= b[i]*J1;
  }

The compiler reports:

     38, Generating present(a[:],b[:])
         Generating copy(J1)
     45, Accelerator restriction: scalar variable live-out from loop: J1
         Accelerator scalar kernel generated

getWeight has been decorated with acc routine and simply changes J1 (passed to the function as a reference) based on some computation using array a.
There is no other variable named J1 living in the outer scope, so there should not be any name/scope conflict.

I don’t see why J1 should live-out from the loop. Every thread should have its own copy and J1 is not needed/used after the loop.

If I change the signature of getWeight from “void getWeight(double* a, double& J1)” to “int getWeight(double*a)” and return J1 by value, i.e.

[code]
#pragma acc kernels present(a[:],b[:])
#pragma acc loop independent
  for(int i=0;i<size;++i){
    double J1 = getWeight(a);
    a[i] -= b[i]*J1;
  }
[/code]

the compiler is willing to parallelize the loop and does not mention anything about scalar variable live-out.

I would have expected that the two versions would be treated equally by the compiler. Is this a bug or am I doing something wrong?

Thanks,
LS

Hi,

I think this issue is related to the following post:
http://www.pgroup.com/userforum/viewtopic.php?t=4656&start=0&postdays=0&postorder=asc&highlight=
and according to the last response it should have been fixed…

Thanks,
LS

Hi LS,

It looks like J1 is being passed by reference. In this case the compiler can’t tell if the variable is being aliased or not, so therefor must assume it is.

Another work around is to manually privatize the variable:

double J1;
#pragma acc kernels present(a,b) 
#pragma acc loop independent private(J1)
  for(int i=0;i<size;++i){ 
    J1 = 0.; 
    getWeight(a,J1); 
    a[i] -= b[i]*J1; 
  }
  • Mat

Hi Mat,

So how is this different to TPR 21358. Your explanation basically means that local variables cannot be passed by reference to device routines because of the aliasing issue, i.e. it would always prevent parallelization.

In your proposed workaround, does the private clause make J1 private for each thread or private per gang, i.e. threads of a gang share it? I suppose the former, but just to make sure I am not making wrong assumptions.

Thanks,
LS

Hi LS,

You’re correct, it really should be the same thing.

I think we made a mistake saying this was fixed in 15.9. When I go back an try the example with 15.9, it still fails. Though it does look like the fix did get into 16.1. Which version are you using? Can you try installing 16.1?

Thanks,
Mat

% cat fs21358.c
#ifdef _OPENACC
#include <openacc.h>
#endif

#pragma acc routine worker
void set( int* in_out )
{
*in_out = ( *in_out ) * 3;
}

int main( int argc, char * argv[] )
{
#ifdef _OPENACC
const acc_device_t dev_type = acc_get_device_type() ;
acc_init( dev_type );
#endif

float a[100];
// int j = 5;

#pragma acc data copyout(a[0:100])

#ifdef _OPENACC
#pragma acc parallel
#pragma acc loop //private(j)
#endif
for( int i=0; i < 100; ++i )
{
int j = 5;
set(&j);
a[i] = j;
}

printf("%f\n",a[1]);
#ifdef _OPENACC
acc_shutdown( dev_type );
#endif

return 0;
}

% pgcc -acc -O0 -Minfo=accel -ta=tesla:nollvm -V15.9 -c fs21358.c
set:
      7, Generating acc routine worker
         Generating Tesla code
main:
     21, Generating copyout(a[:])
     24, Accelerator kernel generated
         Generating Tesla code
         27, #pragma acc loop gang /* blockIdx.x */
     24, Generating copy(j)
     27, Accelerator restriction: scalar variable live-out from loop: j
% pgcc -acc -O0 -Minfo=accel -ta=tesla:nollvm -V16.1 -c fs21358.c
set:
      7, Generating acc routine worker
         Generating Tesla code
main:
     21, Generating copyout(a[:])
     24, Accelerator kernel generated
         Generating Tesla code
         27, #pragma acc loop gang /* blockIdx.x */

Hi Mat,

I am using 15.10. I will try with 16.1, but I was wondering whether the new license key that I will have you generate for 16.1 will also work for 15.10 and earlier versions. There is something mentioned about a new format that is being used.

Thanks,
LS

Yes. The new 16.1 licenses will work with earlier versions of the compiler.