Conceptional Question about OpenACC design


as part of a thesis I’m currently working together with a research group in order to develop a parallelization solution similar to OpenACC, and thus we’re naturally comparing our approach to the one OpenACC uses.

One question we have come across and that does not seem to be explained in the specification is: Why does OpenACC seem to require annotations like:

#pragma acc enter data copyin(...)

when such information could be derived using static code analysis? Is this kind of annotation optional, and if not, is there a known reason for this requirement?

Thanks in advance

Hi dboen_,

I’m not 100% clear on what you’re asking. Are you asking why OpenACC has data regions in general or specifically unstructured data regions? My answers below are for data regions in general.

First, data regions are optional. Not all target architectures have discrete address spaces, such as multi-core CPUs or when using CUDA Unified Memory.

For devices which do have discrete address spaces, the compiler may in some cases be able to determine the size and shape of arrays. For example, Fortran allocatable arrays contain enough information about the array in the descriptor to determine the size and shape of an array. Though C/C++ pointers are unbounded so the compiler usually does not have enough information about the size or shape and hence the user may need to include this information in a data clause (i.e. “copy”, “copyin”, “copyout”, or “create”).

The second question is when to create data on the device. Perhaps with a whole program data flow analysis you could determine where arrays are allocated and when they are updated so could implicitly add data regions and update directives. However, this must be done across the entire program and would break if the code calls libraries where the compiler doesn’t have visibility.

Also, static analysis can’t tell how much data would be used at runtime so would have to presume the arrays fit on a device. Not something that can be presumed in practice.

The bottom line is that static analysis can work in some cases, but not all. Hence, in your solution you will need to have some method to allow users give more information about the size and shape of the data structures as well as control over when data is created on the device and when data should be synchronized between the device and host.

Hope this is clear, but if not, please feel free to ask any follow-up questions.

Best Regards,