Hi @all,
a common occurence in programming is that you may have to do parallelizable tasks after another.
Say you want to write a function declared as #acc routine worker on the device which contains 2 sequential loops, like
acc routine worker
void foo()
#acc loop
for(size_t i=1;i<100;i++){
do stuff
}
#acc loop
for(size_t i=1;i<100;i++){
do other stuff
}
}
now if these simple for loops are something more difficult, say they are dot products, or matrix multiplications, you certainly do not want a matrix multiplication to copy and paste into your function.
What you would like to do when you are working with data on device is:
acc routine worker
void foo()
function1()
function2()
}
Unfortunately, a worker, in openacc can not call another worker loop and this is a problem, if function1 and function2 would contain worker loops.
However, we have this nice attribute:
attribute((always_inline)) inline
does this solve this situation?
so can I say
attribute((always_inline)) inline void function1(){
#acc loop
for(size_t i=1;i<100;i++){
do stuff
}
attribute((always_inline)) inline void function2(){
#acc loop
for(size_t i=1;i<100;i++){
do other stuff
}
and then call these worker loops from
acc routine worker
void foo(){
function1();
function2();
}
and is this then equal to
acc routine worker
void foo()
#acc loop
for(size_t i=1;i<100;i++){
do stuff
}
#acc loop
for(size_t i=1;i<100;i++){
do other stuff
}
}
or are the loops in function1 and function2 treated as sequential then by the compiler?
i have tried the above solution with my code, and now it does not crash anymore, regardless of the optimization levels.
However, i see no information about the parallelizings of the loops within those functions in the compiler output.
Not even that they would be treated sequentially… nvc++ is just silent about these inlined regions function1 and function2. The only thing to find out whether something correct was done is that i could not write an #acc parallel loop directive inside when the function was called from a worker (which is expected behavior, since this would only allow #acc loop), and that the on_device function returned true on runtime despite myfunction1 and myfunction2 not being marked as openmp routines.
But i got no messages in the compiler output from these regions how the loops in these inlined functions were compiled actually… whether it was sequentially, or according to their directives.
( i want to note that i find this strange that a worker can not call a worker. of course one should not be allowed to call a worker function from within a worker loop in a worker function. But not everything in a worker function is a loop, sometimes its just one large matrix multiplication after another sequentially)…
I hope someone can confirm whether
attribute((always_inline)) inline
really can save the day in this situation…