malloc inside a CUDA kernel malloc for a pointer inside CUDA, declared in host code

Hi,

Does the malloc function work inside a CUDA kernel?  It may be possible (dynamic memory allocation), but may not be with malloc function. 

Can someone explain how it can be done and in what way?

The CUDA Development environment is:

nvidia-smi -q


GPU 0:2:0
Product Name : Tesla C2050 / C2070
Display Mode : Enabled
Persistence Mode : Disabled

nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2011 NVIDIA Corporation
Built on Thu_May_12_11:09:45_PDT_2011
Cuda compilation tools, release 4.0, V0.2.1221

In BTW, following is the code snippet where dynamic allocation is required inside a CUDA kernel.

typedef struct Para
{

int x;
float f;
};

typedef struct Root
{

float a,b;
Para *para;
};

global void mlc_ker(struct Root *root_dev)
{

struct Para *para_dev;

para_dev = (struct Para ) malloc(1sizeof(struct Para));

root_dev->para = para_dev;
para_dev->x = 25;
para_dev->f = 30.12;

}

int main()
{

struct Root *root, *root_dev;

printf(“\n Size of Root with Para pointer = %d \n”, sizeof(Root));
root = (Root ) malloc(1sizeof(Root));

root->a=10.11;
root->b=11.11;

cudaMalloc( (void **) &root_dev, 1sizeof(Root));
cudaMemcpy(root_dev, root, 1
sizeof(Root), cudaMemcpyHostToDevice);

mlc_ker<<<1,1>>> (root_dev);

cudaMemcpy (root, root_dev, 1*sizeof(Root), cudaMemcpyDeviceToHost);

printf(“\n Size of Root with Para pointer = %d \n”, sizeof(Root));

printf(“CONTENTS AFTER CUDA CALL : root->a = %f \n root->b = %f \n root->para->x = %d \n root->para->f = %f \n”, root->a, root->b, root->para->x, root->para->f);

return 0;

}

The compilation throws following error:

nvcc mem_alc.cu -o mem_alc

mem_alc.cu(9): warning: declaration requires a typedef name

mem_alc.cu(16): warning: declaration requires a typedef name

mem_alc.cu(24): error: calling a host function(“malloc”) from a device/global function(“mlc_ker”) is not allowed

1 error detected in the compilation of “/tmp/tmpxft_00002391_00000000-4_mem_alc.cpp1.ii”.

Can some one explain the dynamic memory allocation procedure inside CUDA? And, is the above requirement achievable (i.e. Allocating memory inside CUDA kernel for a pointer declared inside host code?)

Thanks in advance,

Make sure you compile for compute capability 2.0 ([font=“Courier New”]-arch sm_20[/font]).

And fix the declaration of struct Root:

typedef struct Root

{

    float a,b;

    struct Para *para;

};

Now compilation is successful.

But its giving segmentation fault. The reason may be that, the Para pointer got memory allocated inside CUDA, but in host code there is no

memory allocation done for struct Para pointer.

__global__ void mlc_ker(struct Root *root_dev){

struct Para *para_dev;

para_dev = (struct Para *)  malloc(1*sizeof(struct Para));

root_dev->para = para_dev;

para_dev->x = 25;

para_dev->f = 30.12;

}

But in host, memory allocated for structure Root only.

root = (Root *) malloc(1*sizeof(Root));

And, where I can read more details about Memory handling in CUDA. Is there any specific book, article?

Obviously you’ll have to copy the new pointer from the device to the host.

The standard reference in the CUDA C Programming Guide.