Problem with openacc with variable initialization

I have the following code:

module PYINT8
	DOUBLE PRECISION XPVMD(-6:6),XPANL(-6:6),XPANH(-6:6),XPBEH(-6:6),XPDIR(-6:6)
	!$acc declare create(XPVMD,XPANL,XPANH,XPBEH,XPDIR)
end module

module PYINT9
	DOUBLE PRECISION VXPVMD(-6:6),VXPANL(-6:6),VXPANH(-6:6),VXPDGM(-6:6)
	!$acc declare create(VXPVMD,VXPANL,VXPANH,VXPDGM)
end module

module pythia_converted
    implicit none

contains
	SUBROUTINE PYGVMD(ISET,KF,X,Q2 ,P2  ,ALAM,XPGA,VXPGA)
	!$acc routine seq
		IMPLICIT NONE
		!arguments
		integer::iset
		integer::kf
		double precision::x
		double precision::q2
		double precision::p2
		double precision::ALAM
		double precision,dimension(-6:6)::XPGA
		double precision,dimension(-6:6) ::VXPGA

		RETURN
	END SUBROUTINE
	
	SUBROUTINE PYGGAM(ISET,X,Q2,P2,IP2,F2GM,XPDFGM)
	!$acc routine seq      
		use PYINT8
		use PYINT9

		IMPLICIT NONE	
		!$acc routine (PYGVMD)

		!arguments
		integer::iset
		double precision::x
		double precision::q2
		double precision::p2
		integer::ip2
		double precision::f2gm
		double precision,dimension(-6:6) ::XPDFGM
		!locals
		double precision,dimension(-6:6) ::XPGA
		double precision,dimension(-6:6) ::VXPGA
		double precision::q2a
		double precision::facnor
		double precision::p2mx

		double precision::ALAM = 0.20D0

		CALL PYGVMD(ISET,1,X,Q2A,P2MX,ALAM,XPGA,VXPGA)
	END SUBROUTINE

end module

The compiler gives me:

pgfortran -g -O2 -fPIC -Mnomain -lpgf90 -acc -ta=nvidia -Minfo=accel, -I./  -o m2.f95_o -c m2.f95
pygvmd:
     15, Generating acc routine seq
PGF90-S-0155-Compiler failed to translate accelerator region (see -Minfo messages): No device symbol for address reference (coding/m2.f95: 1)
pyggam:
     31, Generating acc routine seq
  0 inform,   0 warnings,   1 severes, 0 fatal for pyggam

But if I remove the initial value ALAM = 0.20D0 and convert it into:

double precision::ALAM
ALAM = 0.20D0

It works fine!

Hi esepulveda26247,

The compiler should be issuing a better error message here (we have an open problem report for this, TPR#21780). Variables in OpenACC routines can’t have the “SAVE” attribute since this puts the variable in to static local storage which is not accessible from the device. Data initialized variables such as ALAM are given the SAVE attribute implicitly.

The fix is to simply initialize ALAM in the body of the routine instead of the declaration.

      double precision::ALAM
      ALAM = 0.20D0
      CALL PYGVMD(ISET,1,X,Q2A,P2MX,ALAM,XPGA,VXPGA)

Hope this helps,
Mat

Hi Mat,

This is similar to error above. Using compiler 16.5 (C/C++), I see following message while compiling one of the file:

Compiler failed to translate accelerator region (see -Minfo messages): No device symbol for address reference

This is when I add following into the code:

 int _slist2[4];
 #pragma acc declare create(_slist2)

Seems like this is not supported. Could you please confirm?

Hi PramodK

Putting global fixed size arrays in a “declare create” clause is supported in 16.5 and should work correctly. I’ve used it many times without issue.

Can you post more information and/or a example of the code and what options you’re using?

Using externs in a “declare create” does require RDC, so if you’re using the “-ta=tesla:nordc” option, this could be the source of the failure.

Other than that, if you could indicate how “_slist2” is declared (i.e. is it an extern?) and under what context it’s being used.

  • Mat

Hello Mat,

I am looking at this after 4 years mainly because we want to now build shared library with nordc. With shared library + nordc, the values are not updated. I assume this is expected and there is no way around other than changing code to not use “declare” directive?

static float xmy_data[3] = {1,2,3};
#pragma acc declare copyin(xmy_data)

void init(double val) {
    #pragma acc parallel loop present(xmy_data[0:3])
    for(int i=0; i<3; i++)
    {
        float v = xmy_data[i];
        printf("=%f\n", v);
    }
}

This works correctly without nordc. But with nordc we get all 0 values:

+ pgc++ -acc -Minfo test.cpp ext.cpp main.cpp
test.cpp:
init(double):
      8, Generating Tesla code
         16, #pragma acc loop gang, vector(3) /* blockIdx.x threadIdx.x */
sample():
     23, Generating Tesla code
         25, #pragma acc loop gang, vector(3) /* blockIdx.x threadIdx.x */
ext.cpp:
y_square():
      2, Generating acc routine seq
         Generating Tesla code
main.cpp:
+ ./a.out
=1.000000
=2.000000
=3.000000
Implicit wait  file=/gpfs/bbp.cscs.ch/project/proj16/kumbhar/pramod_scratch/acc_lib_expr/test.cpp function=_Z4initd line=8 device=0 threadid=1
Implicit wait  file=/gpfs/bbp.cscs.ch/project/proj16/kumbhar/pramod_scratch/acc_lib_expr/test.cpp function=_Z6samplev line=23 device=0 threadid=1
Done

+ pgc++ -acc -ta=tesla:nordc -Minfo ext.cpp -c -fPIC
y_square():
      2, Generating acc routine seq
         Generating Tesla code
+ pgc++ -acc -ta=tesla:nordc -Minfo test.cpp -c -fPIC
init(double):
      8, Generating Tesla code
         16, #pragma acc loop gang, vector(3) /* blockIdx.x threadIdx.x */
sample():
     23, Generating Tesla code
         25, #pragma acc loop gang, vector(3) /* blockIdx.x threadIdx.x */
+ pgc++ -acc -ta=tesla:nordc -shared -o test.so test.o
+ pgc++ -acc -ta=tesla:nordc main.cpp test.so
main.cpp:
+ ./a.out
=0.000000
=0.000000
=0.000000
Implicit wait  file=/gpfs/bbp.cscs.ch/project/proj16/kumbhar/pramod_scratch/acc_lib_expr/test.cpp function=_Z4initd line=8 device=0 threadid=1

Things like global data and cross file device function calling require linking by the device link (nvlink), which is the main functionality that RDC provides. Without RDC, the link step is not performed hence the “declare” directive can’t be used on global variables.

However since you first posted, we have added limited RDC support to Linux shared objects, so you may try again with newer compiler (19.10 or newer) and remove the “nordc” flag. The device symbols will only be accessible within the shared object’s device code, but hopefully that’s enough here.

-Mat

Hello Mat,

I am using :

$ pgc++ --version
pgc++ 19.10-0 LLVM 64-bit target on x86-64 Linux -tp skylake

And if I run without nordc then we get :

    + pgc++ -acc -Minfo ext.cpp -c -fPIC
y_square():
      2, Generating acc routine seq
         Generating Tesla code
+ pgc++ -acc -Minfo test.cpp -c -fPIC
init(double):
     10, Generating Tesla code
         26, #pragma acc loop gang, vector(3) /* blockIdx.x threadIdx.x */
sample():
     34, Generating Tesla code
         36, #pragma acc loop gang, vector(3) /* blockIdx.x threadIdx.x */
+ pgc++ -acc -shared -o test.so test.o ext.cpp
ext.cpp:
+ pgc++ -acc main.cpp test.so
main.cpp:
+ ./a.out
Failing in Thread:1
call to cuModuleGetGlobal returned error 500: Not found

(note that I am doing this in the context of Clarification on using OpenACC in a shared library where I want to use this with shared library support)