Hi,
I am using PGI 19.10. I need to pass a few simple POD objects and a few scalars to the openacc kernel.
struct Point { double x,y,z; };
Point p1,p2,p3,p4,p5,p6;
These objects are stored and updated on host, and will be used as input for GPU kernels.
If I were writing in CUDA, I would have passed them to cuda kernels by value, and this is what I’ve been assuming openacc kernels would’ve done for me. I just wrote things like
// compile with "release build" flags
#pragma acc parallel loop independent
for (int i = 0; i < N; ++i) {
// use p1 ~ p6 here as input
}
and never found any problems. It was possible that there were problems and I was not aware of them.
I recently compiled the code with debug flags, and noticed PGI compiler generated “implicit copy (if not already present)” for these POD objects, which I have never seen in the release build. It seemed that if POD objects were modified between two calls of this kernel, the second call was not aware of the changes in POD objects, and moreover, the second call modified the objects to their old values.
With the help of this message–“implicit copy (if not already present)”, I can see why this was happening. I changed my code, so it did
#pragma acc enter data create(p1,p2,p3,p4,p5,p6)
#pragma acc exit data delete(p1,p2,p3,p4,p5,p6)
somewhere, and used
#pragma acc update device(p1,p2,p3,p4,p5,p6)
immediately after POD objects were updated on host. And I added “present” to the kernel:
#pragma acc parallel loop independent present(p1,p2,p3,p4,p5,p6)
for (int i = 0; i < N; ++i) {
// use p1 ~ p6 here as input
}
I didn’t expect these POD objects and scalars can be so different. I definitely wish that I could treat them as in the same way, and wonder what the best practice for passing POD objects is.
Thanks,
stw