Build Tree on Device How to create a tree parallely on CUDA

Being a noob in programming on CUDA , I would like to know -

  1. Can we allocate memory on device during runtime? (i.e use cudaMalloc in a kernel function)
    <I somehow guess that the answer is no! :/ >

  2. How can I create say N trees parallely on CUDA.
    Based on my requirements, I need to build 1 tree on 1 thread. So if i have N threads, implies that I should have N trees!

A code snippet might help a lot!
Thanks in advance.

  1. You can use either ‘malloc’ or ‘new’ operator inside kernel code, provided Your GPU is 2.x compute capability, as far as I remember.

  2. You could create them in local thread memory, in case only a single thread can access a single tree, by defining an array of tree nodes (this article may help: Array Implementation of Trees). This could be done also in global/shared memory, depending on how many threads You need to access a single tree.