I am working on parallel optimization of a CFD code. I am facing a problem of which I am not very sure of the answer is.
My question is: It looks like that OpenACC supports the use of “acc routine” for array-valued functions/subroutines (i.e. array as outputs). If the routine is called in an OpenACC parallel region (maybe parallel loop or kernel loop region), and one argument (with intent(inout)) is an array, for example A(:,:,:). Then do all the threads share the array A or every thread has their own copy of A. If every thread has its own copy of A, then this way of coding would affect the performance greatly, correct?
I hope this is clear. If not, I may formulate a simple code. Thank you very much!
I’m a bit confused by your question in that an array-valued function is a function that has an array as the return type. Array-valued functions are not supported in OpenACC since the compiler would need to create a temp array to hold the return type and allocating temp arrays on the device is very poor for performance.
However, passing in an array as an argument with an intent(inout) is fine.
Then do all the threads share the array A or every thread has their own copy of A.
That would depend on what type “routine” it is. If it’s a “routine seq”, then there only one thread. All threads that call this function would be sharing A. If it’s “routine vector” or “routine worker”, then the array would be shared among the vectors/workers.
Hopefully this answer the question, but if not, please restate and if possible, include example code.
Thank you very much for your reply! I asked this question as I always had some solution errors, and then I attributed the errors to OpenACC not supporting arrays as arguments when calling device routines. Later I found some indexing errors in that acc routine and fixed the code. ISorry for the dumb question, but your reply is still very helpful! I really appreciate that!