How to allocate dynamic memory on device?


I am new to CUDA and have the following question:

how can I allocate dynamic memory within a global function on the graphics

card (the device), do something with it, go back to the host and then return to the

device and use the same memory again.

E.g., creating something like a list on the graphics card:


Invoke a kernel, e.g., call global void func().

Allocate dynamically an object “element1” of type “struct ListElement” and save

the pointer to it (on the device).

Return to the host and do something on it.

Again, invoke func() and create an object “element2” and set

element1->next = element2.

Call func() and destroy all elements in the list.

Thanks a lot.

Best regards


You can’t, unless you pre-allocate a large block of memory in which you run your own memory allocation scheme using atomic functions.

Take a look at this:

This was done by Ian Buck and Stephen Jones at NVIDIA, so it seems to me at least like the writing is on the wall for this to appear in some newer version of CUDA.

If you are impatient you could probably implement it yourself.

It certainly needs to appear in a future CUDA release to make good on the promised full support for C++ new and delete in the Fermi white paper. (Neat that they did this on the C1060 though! Would like to see such things on compute capability < 2.0.)