hi.

im trying to implement bvh on gpu following

and got stuck at the bounding box calculation, particularly the part

“The approach I adopt in my paper is to do a parallel bottom-up reduction, where each thread starts from a single leaf node and walks toward the root. To find the bounding box of a given node, the thread simply looks up the bounding boxes of its children and calculates their union. To avoid duplicate work, the idea is to use an atomic flag per node to terminate the first thread that enters it, while letting the second one through. This ensures that every node gets processed only once, and not before both of its children are processed.”

my code (simplified) is like this

struct Point{

float x, y;

};

struct BBox{

Point bottomleft, topright;

};

struct TreeNode{

int left, right, parent; // indices of left right children and parent; for leaves left = right = -1; for root, parent = -1

BBox box;

int count_arrival; // this is the atomic flag per node mentioned in the above paragraph; initialized to 0

};

**global** set_bbox(TreeNode *nodes, …){

int i = …; // compute thread index

while(i >= 0){

int left = nodes[i].left, right = nodes[i].right;

if(left < 0 && right < 0) i = nodes[i].parent; // at leaf; walk up the tree to parent

else{

if(atomicAdd(&nodes[i].count_arrival, 1) == 1) break; // first thread arrived stops

else{ // second thread arrived does the work

nodes[i].box = union(&nodes[left].box, &nodes[right].box); // device function to calcualte union of boxes

i = nodes[i].parent; // walk up the tree

}

}

}

}

with this codes, the internal nodes’ boxes are not calculated correctly, some coordinates are 0s, seemingly indicating children’s boxes are not ready when parent’s box is being set, which i dont understand how/why.

i feel there is something about atomic operation that the blog/paper mentions i dont understand.

please help.