PGI 13.2 compiler and pinned memory

Good time of day.

I have a working OpenACC code, which I was compiling with 12.10 version of the PGI compiler. In the code I use pinned memory to reduce CPU-GPU communication time.
When I try to run the same code, compiled by the the 13.2 version, I get the next error:
call to cuMemHostRegister returned error 208: Already mapped

I found that I can solve the problem only by removing “pinned” from the code.

When I tried to find the reason why the code doesn’t work with pinned memory allocation, I found that the 13.2 v. compiler use cuMemHostRegister to copy data from CPU to GPU, when for 12.10 v it is cuMemcpyHtD.

According to the manual about cuMemHostRegister ( http://www.clear.rice.edu/comp422/resources/cuda/html/group__CUDA__MEM_gf0a9fe11544326dabd743b7aa6b54223.html), it “Page-locks the memory range specified by p and bytesize and maps it for the device(s) as specified by Flags”

So, it is seems that current 13.2 version of the PGI compiler doesn’t support pinned memory usage.
Is that correct?

So, it is seems that current 13.2 version of the PGI compiler doesn’t support pinned memory usage.
Is that correct?

Actually, we’re using pinned memory by default in 13.x so it may be interfering with your implementation. I’ve had one other issue with this myself (a performance regression) so our current plan is to make the use of pinned memory optional, toggled via a compiler flag and/or environment variable. Hopefully it will solve your issue as well.

Please feel free to send me (via PGI Customer Service trs@pgroup.com), a reproducing example of your issue and I will add it to my issue report.

Thanks,
Mat

Thank you, Mat.

That is strange, because I observe much slower copies for 13.2 version, that I was having for 12.10 version with “pinned” memory allocation.

How do you specify “using pinned memory by default” ?
Do you have any ideas why the results are so different?

Unfortunately I can’t give you an access to the code, because I don’t have a permission to share it.

Thank you,

Irina

Hi Irina,

I have one benchmark code that also slowed down with this change. It mostly helped but there were a few spots in which it caused problems. I’m still investigating exactly why these few codes slowed down, but I do know it is being caused by the pinned memory. In the short term, we’ll add a flag and an environment variable which disables this feature.

Best Regards,
Mat