Hi, i think it’s hard to create trees such as a b-tree on gpu. reasons are:
1, since cuda doesn’t support dereference, pointers to child nodes should be coded into offsets.
2, kernels can’t dynamically create a new node.
3, even if the build is done, searches in the tree are hard, since gpu dislikes pointer-chasings in loops.
even to copy an already-created tree from main mem to gpu is hard, since
1, like serializing to disk, the pointers are no longer meanful once the nodes leave the memory. page number or relative offsets must be calculated.
2, data types such as doubles, chars may be of a different byte-length on gpu, thus the bytewise cudaMemcopy may mess up the structure.
depending on you need to change the tree structure (not data) you could always use an array for the tree, holding 2 arrays acutely one for the data and one for the indexes of the children. But you have to take into account that if your tree is big and doesn’t fit into cached memory, you will have great latencies with such memory access.
Thank you very much! yeah, that’s true for the latency.
but i think if we store the first several levels of the tree in the shared memory, and then several other levels in the constant buffer, and all other lower levels in the textures, then perhaps the latencies could be lessened?
the fatal weakness of this tree in GRAM is that the insert/delete seems impossible, since parallel updating a shared tree is very difficult and needs lock protocols and so on.