Setting the environment variable “PGI_ACC_NOTIFY=2” will show the data transfers between the device and host. If you compile with “-ta=tesla” and see the data transfers, then compile “-ta=tesla:managed” and no longer see data transfers (since they would now be handled directly by the CUDA driver), then you know for certain that CUDA Unified Memory is in effect. You can also run your code through nvprof and see if there are differences.
Note that CUDA Unified Memory is only available for dynamic memory. Static data still needs to be managed via OpenACC data directives.
I’ve assigned PGI compiler to compile an open source software, and I find that it seems to have no effect when I add the “–ta=tesla:managed” flag.
Environment variables are set in your shell. If you are using csh, use “setenv PGI_ACC_NOTIFY 2”. For bash use “export PGI_ACC_NOTIFY=2”. Set this before running your program.
Since the program uses OpenACC already, the data movement would be handled by the directives and the OpenACC runtime. CUDA Unified Memory would only be used for dynamic data and override the management by the OpenACC runtime. Static data would still be managed by the OpenACC runtime.