CUDA class - allocate memory using malloc (Dynamic Global Memory Allocation and Operations)

nico88chessa · January 30, 2017, 9:23pm

Hi to everybody,

I have a problem using malloc inside device code and copy data from host memory.
I start with code

#include <cuda.h>
#include <cuda_runtime_api.h>

class CuMat {
public:
   typedef CuMat* Ptr;
   char* data;

    __device__ CuMat() {
        unsigned int sizeByte = 10;
        data = (unsigned char *) malloc(sizeByte);
        printf("allocated %d bytes at position %x\n", sizeByte, (int)data);
        memset(data, 1, sizeByte);
    }
}

// from Host

void testFuction {

    char* test = (char*) malloc(10); // allocate data on host

    CuMat::Ptr cuSrc;
    cudaError_t error = cudaMalloc(&cuSrc, sizeof(CuMat)); // allocate data on device
    initCudaMat<<< 1 , 1 >>>(cuSrc);           // initialize device

    cudaDeviceSynchronize();

    CuMat::Ptr cuSrcHost = (CuMat::Ptr) malloc(sizeof(CuMat));
    error = cudaMemcpy(cuSrcHost, cuSrc, sizeof(CuMat), cudaMemcpyDeviceToHost); // OK

    error = cudaMemcpy(cuSrcHost->data, test, 10, cudaMemcpyDeviceToHost); // FAIL

}

__global__ void initCudaMat(CuMat* mat) {
    new (mat) CuMat();
}

The problem seems to be the code:
error = cudaMemcpy(cuSrcHost->data, prova, 10, cudaMemcpyDeviceToHost);

but I don’t understand the cause of the problem.
It’s true that cuSrcHost is on host, but cuSrcHost->data contains a pointer that reside on device heap memory (as written in B.18. Dynamic Global Memory Allocation and Operations of http://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf).

Or am I wrong?

Which is my mistake?

Thanks to all!

nico88chessa · January 30, 2017, 9:37pm

Maybe is this my problem? cuda - How to copy the memory allocated in device function back to main memory - Stack Overflow
So, allocate heap memory in device code is not accessible using cudaMemcpy(…) API?
Why isn’t compatible?

Robert_Crovella · February 2, 2017, 4:12am

Correct.

nico88chessa · February 2, 2017, 8:47pm

Thanks for the answer!

But I don’t understand: why does this limit exist?

Topic		Replies	Views
cudaMemcpy to device allocated memory (via malloc) fails with CUDA Programming and Performance	1	570	June 25, 2021
''cudaMemcpy'' failed to copy from device memory dynamically allocate using ''malloc'' CUDA Programming and Performance	5	468	October 25, 2022
Memeory allocation on Host Memory allocation to Host to Device Transfer CUDA Programming and Performance	2	1355	December 10, 2009
Problem CudaMallocHost CUDA Programming and Performance	4	2084	July 14, 2015
Unable to allocate more than 2MB using malloc in CUDA kernel CUDA Programming and Performance cuda , kernel	4	1458	April 8, 2020
cudaMalloc_ReadOnly CUDA Programming and Performance	4	2976	April 21, 2009
Solved: Memory Allocation Problems CUDA Programming and Performance	2	4077	September 7, 2015
Accessing GPU global memory allocated on device - by host CUDA Programming and Performance	3	1192	June 3, 2013
Problem with memory allocation on Device CUDA Programming and Performance	2	766	July 22, 2014
malloc can't allocate more than 8Mb from the __device__ function, 6Gb available. CUDA Programming and Performance	4	1567	February 13, 2015

CUDA class - allocate memory using malloc (Dynamic Global Memory Allocation and Operations)

Related topics