''cudaMemcpy'' failed to copy from device memory dynamically allocate using ''malloc''

Ziqi · October 24, 2022, 4:59pm

According to Programming Guide :: CUDA Toolkit Documentation, we may use malloc and free in CUDA kernels or device functions to dynamically allocate or deallocate global memory. I am wondering if cudaMemcpy can be used to copy data from a pointer returned by malloc to host storage. For clarity, suppose we have

__global__ void SOME_KERNEL(SOME_STRUCT* obj, ...)
{
    ...
    obj->SOME_MEMBER_PTR = (TYPE_OF_MEMBER *)malloc(SOME_SIZE_OF_MEMBER);
    ...
}

It is my understanding that we can first copy the object of SOME_STRUCT to host using, say,

cudaMemcpy((SOME_STRUCT*) obj_host, (SOME_STRUCT*) obj_device, sizeof(SOME_STRUCT), cudaMemcpyDeviceToHost);

For clarity, I am using the modifier (SOME_STRUCT*) to emphasize the type of data. I understand that (void *) is the standard type to use for cudaMemcpy calls.

Then obj->SOME_MEMBER_PTR is a pointer on host whose address is converted to device memory in some look-up-table. Intuitively, we should be able to use the following to copy data stored in the dynamically allocated memory:

cudaMemcpy((TYPE_OF_MEMBER*)mem_host, (TYPE_OF_MEMBER*)obj_host->SOME_MEMBER_PTR, sizeof(TYPE_OF_MEMBER), cudaMemcpyDeviceToHost);

However, it seems that this approach does not apply to dynamically allocated memory. On the contrary, there is no problem in copying data from memory allocated by cudaAlloc. Could anyone confirm my finding and explain why cudaMemcpy cannot be used to copy from dynamically allocated memory?

rs277 · October 24, 2022, 6:01pm

I may be missing something, but looking here, I understand the labels on the left hand side to indicate the domains in which the functions can be called.

cudaMemcpy appears to be only usable on the host.

Ziqi · October 24, 2022, 6:27pm

cudaMemcpy is indeed called on host. The point is, if it can be used to copy data from dynamically-allocated device memory to host. Copying data from device memory allocated using cudaMalloc is no doubt.

rs277 · October 24, 2022, 6:34pm

My misunderstanding. I thought you were wanting to call cudaMemcpy on the device.

striker159 · October 25, 2022, 7:07am

In the linked section of the programming guide, it clearly states that it is not possible to copy kernel malloc data directly to the host.

In addition, memory allocated by a call to malloc() or __nv_aligned_device_malloc() in device code cannot be used in any runtime or driver API calls (i.e. cudaMemcpy, cudaMemset, etc).

If you want to copy the data to the host, you need to first copy it via kernel into memory allocated with cudaMalloc*, then transfer that memory to the host.

Ziqi · October 25, 2022, 3:43pm

I see. Thanks!

Topic		Replies	Views
cudaMemcpy to device allocated memory (via malloc) fails with CUDA Programming and Performance	1	555	June 25, 2021
Transfer heap (dynamically allocated memory) back to host CUDA Programming and Performance	3	791	January 25, 2011
CUDA class - allocate memory using malloc (Dynamic Global Memory Allocation and Operations) CUDA Programming and Performance	3	2950	February 2, 2017
Dynamic Memory Allocation on the Host CUDA Programming and Performance	6	1881	May 26, 2010
Simple cudaMallocHost beginner question CUDA Programming and Performance	5	2698	September 29, 2008
Memeory allocation on Host Memory allocation to Host to Device Transfer CUDA Programming and Performance	2	1355	December 10, 2009
Question Dynamic Memory Allocation in the kernel function CUDA Programming and Performance	2	3620	November 30, 2009
cudaMemcpy / cudaMalloc CUDA Programming and Performance	2	4006	April 15, 2008
cuMemAllocHost, how to use ? CUDA Programming and Performance	3	4868	October 29, 2007
Cudamalloc time consuming? CUDA Programming and Performance	5	2521	July 22, 2009

''cudaMemcpy'' failed to copy from device memory dynamically allocate using ''malloc''

Related topics