CUDA Unified Memory By PGI

Hi,

After I use “–ta=tesla:managed” flag to compile the code, How can I know what data management could be used.

PS: I’ve assigned PGI compiler to compile an open source software, and I find that it seems to have no effect when I add the “–ta=tesla:managed” flag.

best,

Jackie

Hi Jackie,

Setting the environment variable “PGI_ACC_NOTIFY=2” will show the data transfers between the device and host. If you compile with “-ta=tesla” and see the data transfers, then compile “-ta=tesla:managed” and no longer see data transfers (since they would now be handled directly by the CUDA driver), then you know for certain that CUDA Unified Memory is in effect. You can also run your code through nvprof and see if there are differences.

Note that CUDA Unified Memory is only available for dynamic memory. Static data still needs to be managed via OpenACC data directives.

I’ve assigned PGI compiler to compile an open source software, and I find that it seems to have no effect when I add the “–ta=tesla:managed” flag.

Does the code contain OpenACC directives?

  • Mat

Hi Mat,

Firstly, I have some troubles with setting the environment variable “PGI_ACC_NOTIFY=2”, did I just add the flag to configure?

Secondly, If I don’t manage static data when I use the CUDA Unified Memroy, what’s gonna happen

Thirdly I would say “Yes”, the code contain many OpenACC directives. Anything’s I have to keep an eye on to make a sufficient use?

best,

Jackie

Hi Jackie,

Environment variables are set in your shell. If you are using csh, use “setenv PGI_ACC_NOTIFY 2”. For bash use “export PGI_ACC_NOTIFY=2”. Set this before running your program.

Since the program uses OpenACC already, the data movement would be handled by the directives and the OpenACC runtime. CUDA Unified Memory would only be used for dynamic data and override the management by the OpenACC runtime. Static data would still be managed by the OpenACC runtime.

  • Mat

thank you, it has helped me.

Hi Mat,

I get some wrong informations for using CUDA Unified Memory.

malloc: cuMemMallocManaged returns error code 8010: ALLOCATE: 400000 bytes requested; not enough memory

I list my memory informations as follow.

[webber@localhost RE__DEPLOYMENT_RESTRICTED]$ free
total used free shared buff/cache available
Mem: 3806804 1411072 157988 6308 2237744 2054228
Swap: 10485756 244928 10240828

[webber@localhost RE__DEPLOYMENT_RESTRICTED]$ nvidia-smi
Wed Apr 6 09:04:16 2016
±-----------------------------------------------------+
| NVIDIA-SMI 352.39 Driver Version: 352.39 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GT 705 Off | 0000:01:00.0 N/A | N/A |
| 15% 38C P8 N/A / N/A | 95MiB / 1023MiB | N/A Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 Not Supported |
±----------------------------------------------------------------------------+

best,

Jackie