kd-tree construction with CUDA


I’m working on 3D scan matching which needs a fast nearest neighbour search approach. I tried GLSL with a grid based brute force algorithm. Now I need to apply some more advanced structure to make improvement. But it’s difficult to implement complicated algorithms on GLSL.

Now I wanna build a kd-tree by CUDA. Look my kd-tree is based on 3D points, meaning, each intermediate node contains a point. So it’s quite similar to what photon mapping is using.

Does anyone have experience of building a GPU based kd-tree? Or adding GPU based NNS? I really can’t figure out how to make the procedure like SIMD style.



Well now I’m implementing kd-tree builder for raycasting purposes, and cuda is quite nice suited for this, i’v got ~2 seconds build time on my 8800 GT for ‘sponza’ scene

(~70000 triangles, depth 21, full SAH, and 90% of this time is SAH calculations.

My approach is to first do some precalcs, (make arrays of splitters, triangle minmax’es) and then assign each block of threads to construct one tree node.

So at level 0 you have just one block of (say 128) threads that constructs the root,

(this does not scale wery well, but root node is fast)

at level 1 you have 2 blocks of threads constructing 2 nodes and so on.

When you reach level 10, (1024 nodes) the better way is to switch to one thread per node, and as many blocks as needed shema since nodes gets smaller and smaller, and 128 threads per node is a huge waste.

I’m allocating all the memory required for nodes prior to tree construction

(for depth = 21 its 16MBytes assuming you use 1 unsigned int per node,

30 bits for splitting plane distance, 2 bits for split direction (X, Y, Z, LEAF))

Your KD-Tree of course need to be different from what i’m constructing, since my target is to cut as much empty space as possible, while as you mentioned you need split at points, but the idea is that same.

Mmm… sounds very successful. By the way, I also read Kun Zhou’s paper. But unfortunately, he doesn’t present detail ideas or pseudo codes for the photon mapping kd-tree construction, especially when this is the first time I’m using CUDA. I really need some technical details for a further step. Here I got my at-the-moment question c if you can help:

My initial data are a 2D array of x, y, z, and index times number of points. At each node, I sort the points in the node, and find the median of the longest axis as the splitting plain. By now, CUDPP doesn’t support 2D array sorting (what they called key-value sort). I don’t know how the problem is converted into a flat array sort. Or sort is not needed for each splitting?!

I wonder if it’s possible that u can show me your thesis or report of the tree construction if you have any. Or even, which is the best, if I can see the code! ;-)

so basically your point definition is like this:
struct point
float3 pos;
uint index;
right ?
and you have a list of those points
point allpoints;

if the number of points is relativelly small (say this array is less than 64MB or so)
you could duplicate it 3 times and write 3 sorting kernels that will sort 1st array by point.pos.x, 2nd array by point.pos.y, 3rd by point.pos.z.
This way you sort only once as preprocess step, if array is large and memory limits
does not allow its 3 duplicates, just use 3 index arrays and sort them (in respect to x, y, z)

Compute bounding box of your points (using another set of kernels).

Now when you construct your node, you know its bbox (its a root node box slashed by all parent nodes up to that one you are calculating now), and using a binary search on each array you could tell what is your starting and ending index in each axis, this way you greatly reduce the number of poinsts that you need to ‘sample’ in each kernel

Now on all threads of your constructing kernel calculate median from points within each index pair (minIndexX to maxIndexX, minIndexY to maxIndexY, etc.)
and decide what’s your splitting plane and axis is.
In your case olso remember minIndex and maxIndex index.
(if you need points list only for leafs, save the memory and spit this information only in leafs nodes)

Then when the traverse time comes, you will know that to this particular node
belongs triangles from list(X,Y or Z) from index (minIndex to maxIndex).

I hope that helps you.
Unfortunatelly i have no thesis on my kdbuilder as it is an internal application that accelerates our lightmap production process in company that i work for.
For the reason above no source code … sorry :)

Thank you for this valuable guidance.