Problem with acc routine

When compiling this code (using PGI 15.1)

#pragma acc routine worker
void set( int* in_out )
   *in_out = ( *in_out ) * 3;

int main( int argc, char * argv[] )
#ifdef _OPENACC
   const acc_device_t dev_type( acc_get_device_type() );
   acc_init( dev_type );

   float a[100];

#pragma acc data copyout(a[0:100])

#ifdef _OPENACC
#pragma acc parallel
#pragma acc loop
   for( int i( 0 ); i < 100; ++i )
      int j = 5;
      a[i] = j;

#ifdef _OPENACC
   acc_shutdown( dev_type );

   return 0;

I get this message

pgc++ -fast -Minfo=all -acc -ta:nvidia

set(int ):
5, Generating acc routine worker
Generating Tesla code
17, Generating copyout(a[:])
Accelerator kernel generated
25, #pragma acc loop gang /
blockIdx.x */
17, Generating copy(j)
Generating Tesla code
25, Accelerator restriction: scalar variable live-out from loop: j

Why is the local variable j copied? When commenting out the call to function set, no copy is done.

Are functions that (can) modify their arguments not supported?

Thank you.


Hi L,

Sorry, passing local scalar variables by reference isn’t supported. I added an RFE (TPR#21358) to see if it’s something we can add.

Here’s the message I get after adding “volatile” to “j” declaration:

% pgcc -acc -V15.1 testj.c -Minfo=accel -ta=tesla:nollvm
PGC-W-0095-Type cast required for this conversion (testj.c: 30)
      7, Generating acc routine worker
         Generating Tesla code
PGC-S-0155-Accelerator region ignored; see -Minfo messages  (testj.c: 24)
     21, Generating copyout(a[:])
     24, Accelerator region ignored
     27, Accelerator restriction: invalid loop
     29, Accelerator restriction: Unsupported variable reference type
PGC/x86-64 Linux 15.1-0: compilation completed with severe errors

I added “volatile” to prevent the compiler from hoisting j’s declaration outside of the loop. This is a normal optimization but like your previous post with the local array, shouldn’t be done in this case. I added TPR#21359 to address this issue.


Hi Mat,

thanks for your clarification. I think that this would be a very useful feature otherwise acc routine does not really cover a large set of functions used in general code.
As a workaround we tried to have a function return a struct but that also did not seem to be supported.
For the code we are looking to port to GPU we are coming to the conclusion that unfortunately we have to leave OpenACC aside for the time being and resort to Cuda.

PS: As a PGI customer is there a way to track the status of a ticket or is this for internal use only?

Agreed, it’s just a matter if the underlying hardware and device software layer support pass by reference of local variables.

PS: As a PGI customer is there a way to track the status of a ticket or is this for internal use only?

Not at this time, but we are planning moving to NVBugs which is externally visible. No firm timeline, though.

  • Mat

TPR 21358 - UF: User would like support to pass local scalars by reference to device routines

has been fixed in the current 15.9 release.