BVH construction on gpu

huyleq1989 · November 7, 2023, 2:19am

hi.

im trying to implement bvh on gpu following

and got stuck at the bounding box calculation, particularly the part

“The approach I adopt in my paper is to do a parallel bottom-up reduction, where each thread starts from a single leaf node and walks toward the root. To find the bounding box of a given node, the thread simply looks up the bounding boxes of its children and calculates their union. To avoid duplicate work, the idea is to use an atomic flag per node to terminate the first thread that enters it, while letting the second one through. This ensures that every node gets processed only once, and not before both of its children are processed.”

my code (simplified) is like this

struct Point{
float x, y;
};

struct BBox{
Point bottomleft, topright;
};

struct TreeNode{
int left, right, parent; // indices of left right children and parent; for leaves left = right = -1; for root, parent = -1
BBox box;
int count_arrival; // this is the atomic flag per node mentioned in the above paragraph; initialized to 0
};

global set_bbox(TreeNode *nodes, …){
int i = …; // compute thread index
while(i >= 0){
int left = nodes[i].left, right = nodes[i].right;
if(left < 0 && right < 0) i = nodes[i].parent; // at leaf; walk up the tree to parent
else{
if(atomicAdd(&nodes[i].count_arrival, 1) == 1) break; // first thread arrived stops
else{ // second thread arrived does the work
nodes[i].box = union(&nodes[left].box, &nodes[right].box); // device function to calcualte union of boxes
i = nodes[i].parent; // walk up the tree
}
}
}
}

with this codes, the internal nodes’ boxes are not calculated correctly, some coordinates are 0s, seemingly indicating children’s boxes are not ready when parent’s box is being set, which i dont understand how/why.

i feel there is something about atomic operation that the blog/paper mentions i dont understand.

please help.

striker159 · November 7, 2023, 7:17am

From your description, the atomic flag is only used to avoid computing union(&nodes[left].box, &nodes[right].box); multiple times. However, it does not ensure that nodes[left].box and nodes[right].box have already been calculated.

huyleq1989 · November 7, 2023, 10:14am

the flag makes sure that only the second (and last) thread arrived does the work. one of the children’s boxes has been set by the first thread and the other by the second. so when the second thread arrives, both boxes have been set.

huyleq1989 · November 7, 2023, 10:15am

i figured out the problem. atomicAdd returns the old value BEFORE adding so the flag check should be

atomicAdd(&nodes[i].count_arrival, 1) == 0

Topic		Replies	Views
Thinking Parallel, Part II: Tree Traversal on the GPU Technical Blog	6	906	August 27, 2022
Thinking Parallel, Part III: Tree Construction on the GPU Technical Blog	28	3245	November 7, 2023
Random Error encountered when doing bottom up bounding box calculation of BVH General Topics and Other SDKs cuda	0	280	February 12, 2024
Trie implementation for GPU Implementing a Trie structure for GPU CUDA Programming and Performance	1	2583	March 23, 2012
BFS gives different answer on GPU and CPU CUDA Programming and Performance kernel	10	361	July 22, 2023
Problems with BFS implementation based on atomicCAS and atomicAdd CUDA Programming and Performance	11	1437	January 28, 2020
atomic add operation CUDA Programming and Performance	2	4391	July 22, 2014
Particle in Cell: parallelize interaction weighting to grid could atomics be used here? CUDA Programming and Performance	2	1226	February 8, 2010
Using atomicAdd to step through an array CUDA Programming and Performance	7	3909	May 24, 2011
AtomicAdd algorithm CUDA Programming and Performance	7	3820	August 25, 2009

BVH construction on gpu

Related topics