Hi,
Does the malloc function work inside a CUDA kernel? It may be possible (dynamic memory allocation), but may not be with malloc function.
Can someone explain how it can be done and in what way?
The CUDA Development environment is:
nvidia-smi -q
…
…
GPU 0:2:0
Product Name : Tesla C2050 / C2070
Display Mode : Enabled
Persistence Mode : Disabled
…
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2011 NVIDIA Corporation
Built on Thu_May_12_11:09:45_PDT_2011
Cuda compilation tools, release 4.0, V0.2.1221
In BTW, following is the code snippet where dynamic allocation is required inside a CUDA kernel.
typedef struct Para
{
int x;
float f;
};
typedef struct Root
{
float a,b;
Para *para;
};
global void mlc_ker(struct Root *root_dev)
{
struct Para *para_dev;
para_dev = (struct Para ) malloc(1sizeof(struct Para));
root_dev->para = para_dev;
para_dev->x = 25;
para_dev->f = 30.12;
}
int main()
{
struct Root *root, *root_dev;
printf(“\n Size of Root with Para pointer = %d \n”, sizeof(Root));
root = (Root ) malloc(1sizeof(Root));
root->a=10.11;
root->b=11.11;
cudaMalloc( (void **) &root_dev, 1sizeof(Root));
cudaMemcpy(root_dev, root, 1sizeof(Root), cudaMemcpyHostToDevice);
mlc_ker<<<1,1>>> (root_dev);
cudaMemcpy (root, root_dev, 1*sizeof(Root), cudaMemcpyDeviceToHost);
printf(“\n Size of Root with Para pointer = %d \n”, sizeof(Root));
printf(“CONTENTS AFTER CUDA CALL : root->a = %f \n root->b = %f \n root->para->x = %d \n root->para->f = %f \n”, root->a, root->b, root->para->x, root->para->f);
return 0;
}
The compilation throws following error:
nvcc mem_alc.cu -o mem_alc
mem_alc.cu(9): warning: declaration requires a typedef name
mem_alc.cu(16): warning: declaration requires a typedef name
mem_alc.cu(24): error: calling a host function(“malloc”) from a device/global function(“mlc_ker”) is not allowed
1 error detected in the compilation of “/tmp/tmpxft_00002391_00000000-4_mem_alc.cpp1.ii”.
Can some one explain the dynamic memory allocation procedure inside CUDA? And, is the above requirement achievable (i.e. Allocating memory inside CUDA kernel for a pointer declared inside host code?)
Thanks in advance,