Memory allocation from a device function?

codercat · April 11, 2007, 11:37am

I am doing a project involving CUDA for my Masters thesis and I have run into a bit of a problem that I hope someone can help me out with. What I would like to do is to make a pointer and call something similar to malloc() on it from inside a device function. I have tried to simply create the variable and then take its address when using it as an argument to a function call, but it seems to create problems when I do that. Basically, what I want is a chunk of memory that can only be seen by one thread, that can be passed between device functions by pointer so that the data can be manipulated inside the function call. The memory would be created for each thread, but different data would be put into it.

Right now, to get around the problem, I am creating an array of the variable type on the host and calling cudaMalloc() to create an array with one element per thread. I then index the array from the device using the threadID. This solution feels like a bit of a hack to me and I would prefer to create the memory as needed.

Here is a host code example of what I would like to accomplish:

struct aStruct {

  int a;

  int b;

};

void foo()

{

  aStruct * myStruct = (aStruct*)malloc(sizeof(aStruct));

  bar(myStruct);

}

void bar(struct aStruct * myStruct)

{

  myStruct->a = 1;

  myStruct->b = 2;

}

On a related note, I would eventually like to be able to create data structures such as linked lists and trees on the GPU. I understand that this may be difficult do to the fact that the order of execution of the threads is not gauranteed, but I would like to know if it is possible, and if so, what is the prefered method?

Thanks

bbudge · April 11, 2007, 6:09pm

You can use local memory, however, dynamic allocation is not (in my understanding) allowed.

Building data structures like this will be very hard in the GPU. First, you need to have enough memory already allocated, because dynamic allocation isn’t allowed in device functions. Secondly, as you mention, you need to have knowledge of what other threads are doing.

What I’ve suggested to a colleague for his situation is to run only one block and one thread to build a data structure. It’s likely that he’d be better off running on the CPU and making the copy though.

Brian

Topic		Replies	Views
question about memory allocation CUDA Programming and Performance	1	1618	October 16, 2007
Question Dynamic Memory Allocation in the kernel function CUDA Programming and Performance	2	3620	November 30, 2009
Allocating space in global memory from device CUDA Programming and Performance	1	995	May 10, 2009
How to use OpenGL funtions in CUDA CUDA Programming and Performance	7	8192	March 20, 2011
MultiGPU example in the CUDA SDK some stack problems CUDA Programming and Performance	5	3124	March 11, 2018
cudaMalloc in __device__ code CUDA Programming and Performance	2	5207	February 26, 2008
dynamic memory creation in kernel? CUDA Programming and Performance	1	3109	May 29, 2007
may i malloc mem on device. CUDA Programming and Performance	2	2077	December 30, 2008
What address space are device functions/kernels aware of? CUDA Programming and Performance	3	584	June 6, 2017
malloc-ing in either host or device code CUDA Programming and Performance	2	2251	February 5, 2009

Memory allocation from a device function?

Related topics