use linked list data structure in kernel

I want to know does cuda support complex data structure such as linked list in kernel?



In principle it is possible to create a linked list in device memory, since pointers are supported. However, loading a linked list will be very time consuming, because you will need to translate host pointers to device pointers at every node. Moreover, the pointer dereferencing will lead to very poor memory performance. It is best to convert your lists into fixed arrays when loading them onto the CUDA device.


I want to know how to malloc a memory and release it. Can I use malloc and free in kernel function?

Allocating memory is not directly supported within kernels. It is possible to pass in a large block of memory and carve it up within the kernel. I implemented a test that does this, just to see if it would work. But this approach is generally not recommended. Linked structures will almost never coalesce, for starters, and you must use atomics, at least for allocation, to prevent threads from stepping on each other. If multiple threads simultaneously attempt to modify a linked structure, God help you.

You will get much better performance if you can get your data to somehow fit in arrays.