hi.
im trying to implement bvh on gpu following
and got stuck at the bounding box calculation, particularly the part
“The approach I adopt in my paper is to do a parallel bottom-up reduction, where each thread starts from a single leaf node and walks toward the root. To find the bounding box of a given node, the thread simply looks up the bounding boxes of its children and calculates their union. To avoid duplicate work, the idea is to use an atomic flag per node to terminate the first thread that enters it, while letting the second one through. This ensures that every node gets processed only once, and not before both of its children are processed.”
my code (simplified) is like this
struct Point{
float x, y;
};
struct BBox{
Point bottomleft, topright;
};
struct TreeNode{
int left, right, parent; // indices of left right children and parent; for leaves left = right = -1; for root, parent = -1
BBox box;
int count_arrival; // this is the atomic flag per node mentioned in the above paragraph; initialized to 0
};
global set_bbox(TreeNode *nodes, …){
int i = …; // compute thread index
while(i >= 0){
int left = nodes[i].left, right = nodes[i].right;
if(left < 0 && right < 0) i = nodes[i].parent; // at leaf; walk up the tree to parent
else{
if(atomicAdd(&nodes[i].count_arrival, 1) == 1) break; // first thread arrived stops
else{ // second thread arrived does the work
nodes[i].box = union(&nodes[left].box, &nodes[right].box); // device function to calcualte union of boxes
i = nodes[i].parent; // walk up the tree
}
}
}
}
with this codes, the internal nodes’ boxes are not calculated correctly, some coordinates are 0s, seemingly indicating children’s boxes are not ready when parent’s box is being set, which i dont understand how/why.
i feel there is something about atomic operation that the blog/paper mentions i dont understand.
please help.