BVH construction on gpu


im trying to implement bvh on gpu following

and got stuck at the bounding box calculation, particularly the part

“The approach I adopt in my paper is to do a parallel bottom-up reduction, where each thread starts from a single leaf node and walks toward the root. To find the bounding box of a given node, the thread simply looks up the bounding boxes of its children and calculates their union. To avoid duplicate work, the idea is to use an atomic flag per node to terminate the first thread that enters it, while letting the second one through. This ensures that every node gets processed only once, and not before both of its children are processed.”

my code (simplified) is like this

struct Point{
    float x, y;

struct BBox{
    Point bottomleft, topright;

struct TreeNode{
    int left, right, parent; // indices of left right children and parent; for leaves left = right = -1; for root, parent = -1
    BBox box;
    int count_arrival; // this is the atomic flag per node mentioned in the above paragraph; initialized to 0

global set_bbox(TreeNode *nodes, …){
    int i = …; // compute thread index
    while(i >= 0){
        int left = nodes[i].left, right = nodes[i].right;
        if(left < 0 && right < 0) i = nodes[i].parent; // at leaf; walk up the tree to parent
            if(atomicAdd(&nodes[i].count_arrival, 1) == 1) break; // first thread arrived stops
            else{ // second thread arrived does the work
                nodes[i].box = union(&nodes[left].box, &nodes[right].box); // device function to calcualte union of boxes
                i = nodes[i].parent; // walk up the tree

with this codes, the internal nodes’ boxes are not calculated correctly, some coordinates are 0s, seemingly indicating children’s boxes are not ready when parent’s box is being set, which i dont understand how/why.

i feel there is something about atomic operation that the blog/paper mentions i dont understand.

please help.

From your description, the atomic flag is only used to avoid computing union(&nodes[left].box, &nodes[right].box); multiple times. However, it does not ensure that nodes[left].box and nodes[right].box have already been calculated.

the flag makes sure that only the second (and last) thread arrived does the work. one of the children’s boxes has been set by the first thread and the other by the second. so when the second thread arrives, both boxes have been set.

i figured out the problem. atomicAdd returns the old value BEFORE adding so the flag check should be

atomicAdd(&nodes[i].count_arrival, 1) == 0