The helpful NVIDIA blog article Accelerating Standard C++ with GPUs Using stdpar explains that through CUDA Unified Memory, data dynamically allocated on the heap in CPU code, and compiled by NVC++, can be managed automatically. Despite this knowledge, we are still coming across cudaErrorIllegalAddress runtime errors, which in the end turn out to be accounted for by mistaken usage of stack data. I wonder if there are any plans also to support the automatic management of CPU stack data.
More presently, I would like to be able to determine if a pointer has been allocated on the heap. Is there a function which can confirm that a pointer is targeting heap-allocated data?
Unfortunately, this support requires changes to the Linux OS itself, which have been proposed but not adopted by the Linux folks. Hopefully we’ll be able to convince the community to adopt this change at some point in the future but currently is not supported.
Is there a function which can confirm that a pointer is targeting heap-allocated data?
Not that I’m aware of. Basically, you’d need to inspect the address of the pointer and try to determine which space the address is located. Doing a web search, I found the following post. Although a bit old, it may be helpful.
Regarding the function to determine if an address was allocated on the heap. There is the requirement to ensure that it is NVC++ which compiles any code which allocates on the heap. This implies that NVC++ will have knowledge of the heap allocation calls, and I was wondering where the information is stored. It made me think that perhaps, as NVC++ has that extra knowledge, it could provide a function which could share some of that information with the user?
I’ll run this by our engineers in the next C++ team meeting (next Thursday), but I’m not sure what can be done.
For CUDA Unified memory, we basically replace the calls to new with calls to cudaMallocManaged. Although I haven’t used it myself, there is a CUDA runtime call, cudaPointerGetAttributes, which you can use to see if the pointer is managed or not. Not a great solution, but might give you what you need. See: CUDA Runtime API :: CUDA Toolkit Documentation
The better solution is for HMM to be adopted by the Linux OS so the device can directly access stack and static data, but that’s outside of our control.
cudaPointerGetAttributes is looking rather good. I tried a small test with stack data; global data; operator new (NVC++); and operator new (GCC). The call to cudaPointerGetAttributes itself always returns cudaSuccess but the cudaPointerAttributes argument’s type member is set to cudaMemoryTypeManaged for pointers returned from calls to operator new made by NVC++, and cudaMemoryTypeUnregistered otherwise. While we all wait for HMM adoption, this can at least identify where user action is required.