Access dynamically allocated memory from the host

I am trying to access from the host the heap memory that was dynamically allocated in the kernel. My attempt was to first read the address the device pointer to the allocated memory points to, then use in it a memcpy operation, but it fails (debugger reports some 0xb code). The read address is correct (I print it out in the kernel and after reading it to the host).

I believe the programming guide does not explicitly state that this is not possible, however the examples hint at such a restriction (e.g. the persistently allocated memory is only used by subsequent kernel calls and not referenced from the host).
Is there a way to achieve this or am I missing something?

Really, nobody ever tried or needed something like that?

A question is why do You need this to be done? Isn’t there another way?

The size of allocation becomes known at run-time and is different for each block. The number of allocations per kernel launch is dynamic as well, so this makes for a pretty hard case for host-side allocation (I would first need to allocate a variable array of pointers and then assign each pointer to separate allocations, for which I don’t even know if it can be done from the host). It would be much easier if each block would determine the required allocation size (if any at all), perform it and set the corresponding entry in the array of pointers to point to the correct address (which is then used on the host-side).

I agree that Section D.2.3 Pointers in the Programming Guide is a little vague on whether device-side pointers copied back to the host can be used in a host-side function that has a device-side pointer argument (e.g. cuMemcpyHtoD/DtoH).

I would also like to know if this is possible or if there is still some pointer mangling going on under the hood and that device-side pointers created by the host are not numerically equal to device-side pointers created by the device (when inspected by the host).

FWIW, I assume this will work fine if your application is running in a 64-bit environment that is unified address space capable.