Build (or) Send a Tree on/to the GPU

Alex00 · August 28, 2013, 10:32am

Hi,

I am currently looking at tree and have the following structure

typedef struct nd{

    short int final;
    char number; 
    struct nd *child[2]; 

}node;

So my plan was to build the tree on the GPU, I get how to allocate memory, since I know exactly how many nodes I will have in there and can therefor avoid synchronisation, although I don’t see how to build the tree on the GPU.

Can I simply make some code like :

__global__ void Tree((node * tree, int size)) {

--> That's the part where I'm actually stuck.  How can I build my tree here 

}

int main(int argc, char **argv){

[...]

        cudaMalloc( (void**)&treeCUDA, sizeof(node)*256*256 );

	Tree<<<1,256>>>(treeCUDA, size);


[...]

}

Thanks for the help, I have also read that a tree could be contained in an array, are there any examples anywhere on how to do that ?

Thanks.

njuffa · August 28, 2013, 6:32pm

The tree-on-top-of-array approach seems worth exploring. In the context of some embedded software, where lists had to maintain excellent spatial locality for performance reasons, as well as being fast to save and restore, I have in the past successfully used a list-on-top-of-array approach.

By the same token, trees can be mapped to arrays. For a binary tree, Wikipedia describes one simple storage scheme that can be used (one can of course devise various alternate schemes):

[url]Binary tree - Wikipedia

Note I have not used trees on the GPU in my own work, so I am not claiming the tree-on-top-of-array approach is the way to go, just that it merits investigation.

Alex00 · August 28, 2013, 7:15pm

Hi njuffa, I have looked at that as well, but I might have up to 256 children in the future, though I will implement both approaches to see what gives the best performances.

Thanks for the reply.

MutantJohn · August 28, 2013, 9:46pm

I don’t think you can actually build a tree on the device, can you? I thought cudaMalloc() was a host function only so the only thing you could do is allocate your tree on the device through the host and then process your allocated memory with the device.

njuffa · August 28, 2013, 10:55pm

(1) For information on device-side malloc() and free(), check out the CUDA C Programming Guide at http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html

B.18. Dynamic Global Memory Allocation and Operations
Dynamic global memory allocation and operations are only supported by devices of compute capability 2.x and higher. […]

(2) The building of trees or other data structures does not necessarily require any kind of dynamic memory allocation. For example, programmers were doing in Fortran 77, i.e. back when Fortran had no support for dynamic memory allocation.

MutantJohn · August 28, 2013, 11:08pm

Oh wow, I’m a nublet.

Hey, thanks though. I’ma actually try making a binary tree on my GPU now (c.c. 2.1 so I can) and I’m probably just gonna use the array representation as c.c. 2.1 can’t support recursive function calls. Unless it can…?

njuffa · August 28, 2013, 11:16pm

I was under the impression that the introduction of an ABI with sm_2x, which provides function calling conventions and a stack should allow the use of recursive functions, but I have not tried it and will defer to the documentation. You may need to increase the default stack size depending on the depth of the recursion. I hazard a guess that recursive functions are probably not the fastest way to traverse a tree. The compiler may convert end-recursion into iteration, but I have not looked into that.

MutantJohn · August 29, 2013, 4:11am

I think I’ve tried it before and it told me I need an architecture of 3.5 or higher. But you might be right, Idk. I’ll look into it.

njuffa · August 29, 2013, 5:08am

Best I can determine, recursion has been supported since CUDA 3.1, on sm_2x hardware and up:

[url]https://developer.nvidia.com/cuda-toolkit-31-downloads[/url]

•New language features added to CUDA C / C++ include:
•Support for printf() in device code
•Support for function pointers and recursion make it easier to port many existing algorithms to Fermi GPUs

Alex00 · August 29, 2013, 11:38am

Well, if it is a binary tree, it is easier to use an array representation.

but to answer some questions, the easiest way is to create the tree on the host as an array and send the array, I’m actually working on that right now.

Topic		Replies	Views
Tree structure in CUDA CUDA Programming and Performance	3	2624	June 11, 2017
Cuda Tree Structures CUDA Programming and Performance	3	2617	August 28, 2013
challenges of building trees on gpu pointers, mem allocs CUDA Programming and Performance	2	2776	July 22, 2007
how to create a 2-tree in GPU? Announcements	0	731	December 1, 2016
How to copy a tree to GPU memory allocation for tree on gpu CUDA Programming and Performance	13	2159	June 14, 2011
Build Tree on Device How to create a tree parallely on CUDA CUDA Programming and Performance	1	928	January 30, 2012
Implementing Binary Search Tree on GPU CUDA Programming and Performance	1	11998	July 19, 2011
Passing heirarchical struct to kernel CUDA Programming and Performance	0	515	February 9, 2014
CUDA quadtree!!!! CUDA Programming and Performance	5	4061	July 20, 2013
Tree graph on GPU Move a tree graph structure to the GPU CUDA Programming and Performance	1	754	March 27, 2012

Build (or) Send a Tree on/to the GPU

Related topics