Hello
i wonder, if there is a possibilty for accellerating a function without inlining the the nested function calls. I know function calls within a parallel compute region are not allowed, but i think they are allowed within a data region. For example (pseudo code):
void foo(double* x, double y*){
double a, b, c;
a=(double)malloc(nsizeof(double));
b=(double)malloc(n*sizeof(double));
parFoo1(x,y,c);
for (i = 0; i<M; i++){
parFoo2(a, x);
parFoo2(b, x);
parFoo1(a,b,y);
}
}
Where parFoo* are functions with corresponding acc pragmas. The problem here is, that the data for the arrays a,b,x is copied for every call in each iteration, although the data is only needed in the device for the whole loop. So my idea was to define a data region like this in foo():
void foo(){
#pragma acc data region copyin(x), copyout(y), local(a,b)
{
… the code …
}
}
Unfortunally my approach did not work, because the compiler still copies the the arrays within the accelerated functions parFoo*(), i get a feedback like this:
parFoo1:
60, Generating copyin(x[0:n-1])
Generating compute capability 1.3 kernel
62, Loop is parallelizable
Accelerator kernel generated
My question is, if there is a possibilty to define a data region and avoid copy the needed arry within the nested function call. I guess the parFoo*() function needs to know that the parameter is already a “device pointer”. Thanks for your hints. Could I use the “local” clause to realize this somehow?
Kind regards,
Tim