calling functions from OpenAcc computetion regions

There is a simple example code which gives an error:

#include
#include <accelmath.h>
#include <openacc.h>

const int numBodies=10;
float S1[numBodies] attribute((aligned(64)));
#pragma acc declare create(S1)
struct float3{ float x, y, z; };
float3 normalize(float3 v)
{
float d=sqrtf(v.xv.x+v.yv.y+v.zv.z);
float3 rt;
rt.x=v.x/d;
rt.y=v.y/d;
rt.z=v.z/d;
return rt;
}
float3 normalize(float3 v, int i)
{
float d=sqrtf(v.x
v.x+v.yv.y+v.zv.z);
S1_=d;
float3 rt;
rt.x=v.x/d;
rt.y=v.y/d;
rt.z=v.z/d;
return rt;
}
float function(float3 v)
{
float d=sqrtf(v.xv.x+v.yv.y+v.z*v.z);
return d;
}
int
main(int argc, char **argv)
{
float Vx=0.0f, Vy=0.0f, Vz=0.0f;
float DIST=0.0f;
#pragma acc parallel loop copy(Vx,Vy,Vz,DIST) present(S1)
for(int i=0; i<numBodies; ++i)
{
float3 vector={1,2,3};
float3 res=normalize(vector,i);
#pragma acc loop vector
for(int j=0; j<numBodies; ++j)
{
float dist=function(vector);
DIST=dist;
}
Vx=res.x;
Vy=res.y;
Vz=res.z;
}
std::cout<<“Vx=”<<Vx<<",Vy="<<Vy<<",Vz="<<Vz<<",DIST="<<DIST<<std::endl;
#pragma acc update host(S1)
std::cout<<“S1:”<<std::endl;
for(int i=0; i<numBodies; ++i) std::cout<<S1<<" “;
std::cout<<std::endl;
}

The output it gives is:
Vx=inf,Vy=nan,Vz=0,DIST=0
S1:
0 0 0 0 0 0 0 0 0 0
Use the following compiler options:
$ cmake -DCMAKE_C_COMPILER=pgcc -DCMAKE_CXX_COMPILER=pgc++ -DCMAKE_C_FLAGS=”-acc -Minfo=all -mcmodel=medium -ta=tesla:cc30"-DCMAKE_CXX_FLAGS="-acc -Minfo=all -mcmodel=medium -ta=tesla:cc30

I wrote the float3 normalize(float3 v, int i) function to test what is inside, because the float3 normalize(float3) does not work properly.
Why Vx=inf,Vy=nan,Vz=0?
Why DIST=0?
Why S1 is zero?
Perhaps, it is very simple, but i don’t understand. How to call functions from OpenAcc parallel computation regions? Could You tell?_

Hi @and,

It looks like there’s a couple problems here.

I can reproduce your error with the PGI 17.10 compiler but it gives non-NANs with 18.1. So there was probably a compiler issue with creating your device routines. I can work around this problem with 17.10 by adding “-Minline” so the routines are inlined instead of called.

Note that your code does have a race condition on the assignments to Vx, Vy, and Vz since their values will depend on which iteration of the outer loop is executed last.

% pgc++ test.cpp -ta=tesla:cc30 -w -mcmodel=medium -V17.10 -fast  ; a.out
Vx=inf,Vy=nan,Vz=0,DIST=0
S1:
0 0 0 0 0 0 0 0 0 0

% pgc++ test.cpp -ta=tesla:cc30 -w -mcmodel=medium -V17.10 -fast -Minline ; a.out
PGCC/x86 Linux 17.10-0: compilation completed with warnings
Vx=0.267261,Vy=0.534522,Vz=0.801784,DIST=3.74166
S1:
3.74166 3.74166 3.74166 3.74166 3.74166 3.74166 3.74166 3.74166 3.74166 3.74166

% pgc++ test.cpp -ta=tesla:cc30 -w -mcmodel=medium -V18.1 -fast ; a.out
Vx=0.267261,Vy=0.534522,Vz=0.801784,DIST=3.74166
S1:
3.74166 3.74166 3.74166 3.74166 3.74166 3.74166 3.74166 3.74166 3.74166 3.74166

Hope this helps,
Mat

Thank You! Works.