I use the environment variable PGI_ACC_NOTIFY set to 2 to get information about data transfers from the host to the GPU. I have a simulation that stops with an “out of memory” error and I would like to get a memory profile to make sure that only data that is really necessary on the device is copied over and not incidentally any other data due to an error in the code.
First, I noticed that data of an array seems to be transferred/reported in several chunks ( n equally sized chunks + 1 different size chunk to make up the total).
However, when adding up all the bytes in a spreadsheet I get a number much lower than that reported by the out of memory error:
Out of memory allocating 186119568 bytes of device memory total/free CUDA memory: 6039339008/154578944
The total computed by the spreadsheet (I rechecked several times) is about 2.1 GB. So I am wondering who/what is using up the difference of about 3.xGB.
Does PGI_ACC_NOTIFY 2 not report all data transfers? Or is there some auxiliary data needed/allocated on the GPU alongside my explicit data transfers?