Tree graph on GPU Move a tree graph structure to the GPU

Hello everyone

I’m stuck with a problem.I am very new to CUDA, however I’m going to try implement the Aho-Corasick algorithm on the GPU. This algorithm work such in a way such that it take a dictionary of words, and make a tree structure of these words with one character in each node, springing child nodes for every new character that does not already exist in the tree. See image for example.

The goal for this is not speed of mallocing etc, this will only be done once, then the program will stand in a loop on new input.
I just need to be able to access the node data on the GPU where i would do a comparison of an array of chars, and traverse the tree and against this long array, and find all occurrences of words from the dictionary within this array of chars.

Storing the node tree data on the GPU will be done once, and then be used by the kernel in a loop checking vs new arrays of chars.

Could anyone please provide a template of how this could be done?
A node in the tree would only require a a single char as data, and a bool saying if this node is the last character in a word.

Im running Windows 7 64bit
visual studio 2010 ultimate
CUDA 4.1, SDK 4.1 Nsight 2.1

Ouch!

I couldn’t figure out why you need to do it that way, but clearly, your dictionnary will be huge, and pretty inefficient:
for each word your will find (n characters), you will need probably (n-2) uncached global memory read and 2 cached memory read (best case scenario).
Anyway you will need 26 pointers on any node, or a hashtable, this will consume a lot of memory!

Why don’t you use a hash-table that will enable you to limit to 1+ global memory read ??? (would read 128bit each time, so 16- characters words would need only one memory read)