According to Programming Guide :: CUDA Toolkit Documentation, we may use malloc and free in CUDA kernels or device functions to dynamically allocate or deallocate global memory. I am wondering if cudaMemcpy can be used to copy data from a pointer returned by malloc to host storage. For clarity, suppose we have
__global__ void SOME_KERNEL(SOME_STRUCT* obj, ...)
{
...
obj->SOME_MEMBER_PTR = (TYPE_OF_MEMBER *)malloc(SOME_SIZE_OF_MEMBER);
...
}
It is my understanding that we can first copy the object of SOME_STRUCT to host using, say,
cudaMemcpy((SOME_STRUCT*) obj_host, (SOME_STRUCT*) obj_device, sizeof(SOME_STRUCT), cudaMemcpyDeviceToHost);
For clarity, I am using the modifier (SOME_STRUCT*) to emphasize the type of data. I understand that (void *) is the standard type to use for cudaMemcpy
calls.
Then obj->SOME_MEMBER_PTR
is a pointer on host whose address is converted to device memory in some look-up-table. Intuitively, we should be able to use the following to copy data stored in the dynamically allocated memory:
cudaMemcpy((TYPE_OF_MEMBER*)mem_host, (TYPE_OF_MEMBER*)obj_host->SOME_MEMBER_PTR, sizeof(TYPE_OF_MEMBER), cudaMemcpyDeviceToHost);
However, it seems that this approach does not apply to dynamically allocated memory. On the contrary, there is no problem in copying data from memory allocated by cudaAlloc
. Could anyone confirm my finding and explain why cudaMemcpy
cannot be used to copy from dynamically allocated memory?