Tree structure in CUDA

I am doing a project where I need to use a tree structure.
In the tree every node has 10 children except for the leafs. Now, the idea is that the root node do some calculation on the GPU. Then it splits its results in 10 parts so each child can do some calculation which also need to be on the GPU. And so on for each node until a leaf is reached. Voting is happening on the leaf level.

I am used to program in a very object oriented way(coming from Java and C#). I was thinking to make an object for every node and then each node could call a CUDA kernel and then pass its result to the child nodes that would then start CUDA kernels. However, I am afraid this is too naive and that it will not work in CUDA. I would like to hear if you have any comments on this or if there is some kind of patterns to this problem?

This seems to me like task parallelism,which is not a good match for GPUs.(because 10 tasks are a very small workload)
If you could somehow merge the computation in one kernel then you could do it.