Halo Exchange updates MPI with OpenACC

Hi,

I am trying to run OpenACC directives on an in-house solver, I have a problem with the halo exchange updates I am using Nvidia SDK HPC version 22.3 with managed flag.
In the function below when I add the OpenACC directives and then compute the updates makes the codes so slow.

When adding the Pragmas for porting the mpi communication to GPU as seen below it’s not doing nothing and the performance is still the same.

Is this because I am using managed flag so it’s not working this way? or I am missing something?

Correct. It’s a known limitation in several of the MPI implementations that what passing in CUDA Unified Memory pointers (i.e. “managed”), MPI doesn’t know if the memory is dirty or not, so ends up falling back to host side communication. In order to get the benefit of using CUDA Aware MPI and GPU direct communication, you’ll want to use OpenACC data regions to manage your data rather than use UM.

-Mat