malloc inside a CUDA kernel malloc for a pointer inside CUDA, declared in host code

sanf · January 5, 2012, 11:37am

Hi,

Does the malloc function work inside a CUDA kernel?  It may be possible (dynamic memory allocation), but may not be with malloc function.

Can someone explain how it can be done and in what way?

The CUDA Development environment is:

nvidia-smi -q

…
…

GPU 0:2:0
Product Name : Tesla C2050 / C2070
Display Mode : Enabled
Persistence Mode : Disabled
…

nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2011 NVIDIA Corporation
Built on Thu_May_12_11:09:45_PDT_2011
Cuda compilation tools, release 4.0, V0.2.1221

In BTW, following is the code snippet where dynamic allocation is required inside a CUDA kernel.

typedef struct Para
{

int x;
float f;
};

typedef struct Root
{

float a,b;
Para *para;
};

global void mlc_ker(struct Root *root_dev)
{

struct Para *para_dev;

para_dev = (struct Para ) malloc(1sizeof(struct Para));

root_dev->para = para_dev;
para_dev->x = 25;
para_dev->f = 30.12;

}

int main()
{

struct Root *root, *root_dev;

printf(“\n Size of Root with Para pointer = %d \n”, sizeof(Root));
root = (Root ) malloc(1sizeof(Root));

root->a=10.11;
root->b=11.11;

cudaMalloc( (void **) &root_dev, 1sizeof(Root));
cudaMemcpy(root_dev, root, 1sizeof(Root), cudaMemcpyHostToDevice);

mlc_ker<<<1,1>>> (root_dev);

cudaMemcpy (root, root_dev, 1*sizeof(Root), cudaMemcpyDeviceToHost);

printf(“\n Size of Root with Para pointer = %d \n”, sizeof(Root));

printf(“CONTENTS AFTER CUDA CALL : root->a = %f \n root->b = %f \n root->para->x = %d \n root->para->f = %f \n”, root->a, root->b, root->para->x, root->para->f);

return 0;

}

The compilation throws following error:

nvcc mem_alc.cu -o mem_alc

mem_alc.cu(9): warning: declaration requires a typedef name

mem_alc.cu(16): warning: declaration requires a typedef name

mem_alc.cu(24): error: calling a host function(“malloc”) from a device/global function(“mlc_ker”) is not allowed

1 error detected in the compilation of “/tmp/tmpxft_00002391_00000000-4_mem_alc.cpp1.ii”.

Can some one explain the dynamic memory allocation procedure inside CUDA? And, is the above requirement achievable (i.e. Allocating memory inside CUDA kernel for a pointer declared inside host code?)

Thanks in advance,

tera · January 5, 2012, 12:08pm

Make sure you compile for compute capability 2.0 ([font=“Courier New”]-arch sm_20[/font]).

And fix the declaration of struct Root:

typedef struct Root

{

    float a,b;

    struct Para *para;

};

sanf · January 5, 2012, 12:46pm

Now compilation is successful.

But its giving segmentation fault. The reason may be that, the Para pointer got memory allocated inside CUDA, but in host code there is no

memory allocation done for struct Para pointer.

__global__ void mlc_ker(struct Root *root_dev){

struct Para *para_dev;

para_dev = (struct Para *)  malloc(1*sizeof(struct Para));

root_dev->para = para_dev;

para_dev->x = 25;

para_dev->f = 30.12;

}

But in host, memory allocated for structure Root only.

root = (Root *) malloc(1*sizeof(Root));

And, where I can read more details about Memory handling in CUDA. Is there any specific book, article?

tera · January 5, 2012, 12:58pm

Obviously you’ll have to copy the new pointer from the device to the host.

The standard reference in the CUDA C Programming Guide.

Topic		Replies	Views
Dynamic memory allocation during kernel execution Is it posible? CUDA Programming and Performance	13	169382	January 25, 2013
Error message on allocating __shared__ memory in kernel, Cuda 5.0 CUDA Programming and Performance	8	2050	January 21, 2013
Question Dynamic Memory Allocation in the kernel function CUDA Programming and Performance	2	3626	November 30, 2009
Not working correctly new () and malloc () inside the kernel, why? CUDA Programming and Performance	2	1252	April 4, 2014
How to deal with dynamically allocated 3-dimentional arrays in device's memory? CUDA Programming and Performance	1	765	April 14, 2013
Can we do malloc inside a __global__ function CUDA Programming and Performance	26	9676	February 21, 2010
CUDA quadtree!!!! CUDA Programming and Performance	5	3976	July 20, 2013
in-kernel malloc no kernel lauch although code="sm_21,compute_20" CUDA Programming and Performance	2	1105	September 6, 2011
CUDA + CPU threads CUDA Programming and Performance	5	11655	August 20, 2008
malloc-ing in either host or device code CUDA Programming and Performance	2	2252	February 5, 2009

malloc inside a CUDA kernel malloc for a pointer inside CUDA, declared in host code

nvidia-smi -q

nvcc --version

nvcc mem_alc.cu -o mem_alc

Related topics