Passing POD to OpenACC kernel - best practice


I am using PGI 19.10. I need to pass a few simple POD objects and a few scalars to the openacc kernel.

struct Point { double x,y,z; };
Point p1,p2,p3,p4,p5,p6;

These objects are stored and updated on host, and will be used as input for GPU kernels.

If I were writing in CUDA, I would have passed them to cuda kernels by value, and this is what I’ve been assuming openacc kernels would’ve done for me. I just wrote things like

// compile with "release build" flags
#pragma acc parallel loop independent
for (int i = 0; i < N; ++i) {
   // use p1 ~ p6 here as input

and never found any problems. It was possible that there were problems and I was not aware of them.

I recently compiled the code with debug flags, and noticed PGI compiler generated “implicit copy (if not already present)” for these POD objects, which I have never seen in the release build. It seemed that if POD objects were modified between two calls of this kernel, the second call was not aware of the changes in POD objects, and moreover, the second call modified the objects to their old values.

With the help of this message–“implicit copy (if not already present)”, I can see why this was happening. I changed my code, so it did

#pragma acc enter data create(p1,p2,p3,p4,p5,p6)
#pragma acc exit data delete(p1,p2,p3,p4,p5,p6)

somewhere, and used

#pragma acc update device(p1,p2,p3,p4,p5,p6)

immediately after POD objects were updated on host. And I added “present” to the kernel:

#pragma acc parallel loop independent present(p1,p2,p3,p4,p5,p6)
for (int i = 0; i < N; ++i) {
   // use p1 ~ p6 here as input

I didn’t expect these POD objects and scalars can be so different. I definitely wish that I could treat them as in the same way, and wonder what the best practice for passing POD objects is.


These are defined in the OpenACC 3.0 standard as “composite varaibles” where it specifies the default behavior:

830 data construct, or a visible declare directive. If there is no default(present) clause
831 on the construct, an array or composite variable referenced in the parallel construct that does
832 not appear in a data clause for the construct or any enclosing data construct will be treated as if
833 it appeared in a copy clause for the parallel construct.

I definitely wish that I could treat them as in the same way

You can send in a request to the OpenACC folks (, but while for your particular case making this struct be firstprivate by default might be ok, if would fail if the struct contained dynamic data members. Though, I’ll let them decide if something could be done.

wonder what the best practice for passing POD objects is

Best to manage them via data regions as your updated version does since it will reduce data movement.

Thank you Mat. This is very helpful.