cudaMalloc from inside a kernel

xargon · September 2, 2009, 12:05pm

Hello,

Is it ok to call cudaMalloc from inside a kernel? I need to allocate memory for each of my kernel threads and I was wondering if it is ok to use cudaMalloc or is there a better/faster way.

I would normally just give this a try but the problem is I am working on a PC that does not have a CUDA enabled card :(

Many thanks,

xarg

avidday · September 2, 2009, 12:29pm

Kernels can’t dynamically allocate memory. All of the CUDA runtime API functions are host functions only.

LSChien · September 2, 2009, 12:36pm

no, you cannot call cudaMalloc inside any kernel.

just allocate device memory from host code,

the following code comes from programming guide

[codebox]// Device code

global void VecAdd(float* A, float* B, float* C)

{

int i = threadIdx.x;

if (i < N)

C[i] = A[i] + B[i];

}

// Host code

int main()

{

// Allocate vectors in device memory

size_t size = N * sizeof(float);

float* d_A;

cudaMalloc((void**)&d_A, size);

float* d_B;

cudaMalloc((void**)&d_B, size);

float* d_C;

cudaMalloc((void**)&d_C, size);

// Copy vectors from host memory to device memory

// h_A and h_B are input vectors stored in host memory

cudaMemcpy(d_A, h_A, size, cudaMemcpyHostToDevice);

cudaMemcpy(d_B, h_B, size, cudaMemcpyHostToDevice);

// Invoke kernel

int threadsPerBlock = 256;

int blocksPerGrid = (N + threadsPerBlock â€“ 1) / threadsPerBlock;

VecAdd<<<blocksPerGrid, threadsPerBlock>>>(d_A, d_B, d_C);

// Copy result from device memory to host memory

// h_C contains the result in host memory

cudaMemcpy(h_C, d_C, size, cudaMemcpyDeviceToHost);

// Free device memory

cudaFree(d_A);

cudaFree(d_B);

cudaFree(d_C);

}[/codebox]

xargon · September 2, 2009, 1:02pm

Thanks for the reply.

This is a problem though. I have the following:

[codebox]

bool * myArray = (bool *)(malloc(someSize));;

for (int i = 0; i < sizeX; ++i)

{

for (int j = 0; j < sizeY; ++j)

{

    for (int k = 0; k < sizeZ; ++k)

    {

         // Some processing

         memset(myArray, 0, totalSize*sizeof(bool));

    }

}

[/codebox]

Now, this does not translate easily into the kernel, unless each thread has access to some exclusive memory. I guess I have to create one massive array and give each thread an offset into it…

Cheers,

xarg

Topic		Replies	Views
want to allocate memory inside kernel CUDA Programming and Performance	2	1471	July 13, 2009
Question Dynamic Memory Allocation in the kernel function CUDA Programming and Performance	2	3665	November 30, 2009
Dynamic Memory Allocation inside kernel Can we have a cudaMalloc((void**)&var, size) in our ke CUDA Programming and Performance	1	1515	February 9, 2010
malloc in a kernel CUDA Programming and Performance	2	1797	July 1, 2009
Dynamic memory allocation during kernel execution Is it posible? CUDA Programming and Performance	13	169444	January 25, 2013
dynamic memory creation in kernel? CUDA Programming and Performance	1	3118	May 29, 2007
Use of cudaMalloc in a kernel CUDA Programming and Performance	1	3221	May 4, 2009
malloc of one kernel in another kernel Memory allocated in one kernel can be accessed in another ker CUDA Programming and Performance	5	790	January 23, 2012
malloc inside a CUDA kernel malloc for a pointer inside CUDA, declared in host code CUDA Programming and Performance	3	10024	January 5, 2012
allocate memory from device? CUDA Programming and Performance	1	1916	April 3, 2009

cudaMalloc from inside a kernel

Related topics