Premature stop of BVH traversal?

TracingTheCosmos · January 12, 2024, 4:10pm

Hello everyone,

I’m working with simulation algorithms in physics that involve navigating hierarchical structures, like octrees. Since RT cores accelerate tree traversal, I think there is potential in investigating their usage in this domain. I am using CUDA and OptiX. As I understand it, it is currently impossible to supply a custom made tree in OptiX and have it use the RT cores for its traversal (or is it?). Let’s say that I found a workaround by modifying my “geometry” so that the tree created by the builder has my desired properties.

Even so, in order to seriously benefit performance-wise from the hierarchical structure in my applications, I would need the option to stop traversal prematurely, and ideally base the decision to stop prematurely on data which I have attached to the internal nodes.

Is this possible? Or am I better off just using CUDA and leaving my RT cores unused?
Any guidance and discussion is appreciated.
/Michail

dhart · January 12, 2024, 5:36pm

Hi @TracingTheCosmos, welcome!

With respect to the RTX BVH format, the two relevant things to keep in mind are:

The tree is proprietary and is device & feature dependent. The implication of this is that it may not matter whether it’s possible to work around perceived limitations, it’s more that the format can change out from under you for any reason, including driver updates, and it may not operate the way you want it to or think that it does, especially across different devices and unknown input data. The only promises we make and try to keep are the ones represented by the OptiX API.
The RTX BVH currently only has API and hardware support for ray queries. Traversal of the tree for other purposes, such as KNN queries (a common question we get), either need to be simulated using ray queries, or they won’t work. As long as you’re asking about ray query traversal specifically, you should be okay.

In order to stop traversal prematurely, you can use the “any-hit” program type, that’s what it was designed for. This means you insert objects into the tree at whatever point you might want to stop traversal, and then your any-hit program is invoked, and you can decide at that point what to do with the information. We can talk about how you might do this if you feel like getting more specific about your needs - there are some simple & easy workflows, and there are some much more involved workflows. You can conditionally skip over sub-trees using any-hit programs by building the sub-trees separately from the root tree, and only traversing the sub-tree via your any-hit program. It’s also possible that any-hit programs don’t provide as much flexibility as you’d like. You can’t, for example, stop traversal dynamically or arbitrarily anywhere in the tree without first putting a static object into the tree that has an any-hit program attached.

–
David.

TracingTheCosmos · January 23, 2024, 3:03pm

Hello @dhart, thank you for the comprehensive reply.

The tree is proprietary and is device & feature dependent. The implication of this is that it may not matter whether it’s possible to work around perceived limitations, it’s more that the format can change out from under you for any reason, including driver updates, and it may not operate the way you want it to or think that it does, especially across different devices and unknown input data. The only promises we make and try to keep are the ones represented by the OptiX API.

I understand, makes sense if the hardware is still being improved. Exposing it would constrain you compatibility wise.

The RTX BVH currently only has API and hardware support for ray queries. Traversal of the tree for other purposes, such as KNN queries (a common question we get), either need to be simulated using ray queries, or they won’t work. As long as you’re asking about ray query traversal specifically, you should be okay.

That is not a problem in my case. My queries can be expressed as ray queries by shooting an “infinitesimal” ray from my point of interest.

In order to stop traversal prematurely, you can use the “any-hit” program type , that’s what it was designed for. This means you insert objects into the tree at whatever point you might want to stop traversal, and then your any-hit program is invoked, and you can decide at that point what to do with the information.

This shader, as you indicated later on in your reply, is invoked when the ray intersects with a geometry, i.e., leaf nodes. So if I want to achieve a behavior analogous to what I am describing, I will need to come up with some form of hierarchical geometry, or am I misunderstanding?

Can you offer some resources or examples of the workflows you describe? I think that would be very useful.

Thank you, and all the best!

dhart · January 24, 2024, 6:52pm

The potential workflow I’m suggesting is based on me making some possibly incorrect assumptions about what might work for you. My idea above would involve identifying specific nodes in the tree where you might want to optionally stop traversal, and splitting up the tree manually at those nodes. This won’t really work if what you’re after is the ability to stop traversal dynamically at any node in the tree, but might be reasonable if you can statically partition your tree into an upper tree attached to the root, and lower trees that contain the leaves.

Some people call this idea ‘electric bounding boxes’, and it gets used in the context of on-demand geometry loading. For example, maybe all of my geometry doesn’t fit in memory, so I create intermediate leaf node bounding boxes that are stand-ins for a complex mesh, and wait to load the complex meshes until a ray hits one of these bounding boxes. These bounding boxes will be the leaf nodes of the upper tree that is attached to the scene root node. When the complex mesh is loaded, we can build an acceleration structure for just the mesh only, and optionally traverse it any time a ray hits the corresponding leaf bounding box. The mesh acceleration structure represents a lower tree that we can choose to traverse optionally. Or alternatively we can put the lower tree into the upper tree as part of a tree rebuild.

BTW note that idea here with demand-loaded geometry is to fill the electric bounding box with a lower tree “on demand” during the time between kernel launches, it is not something you would typically respond to dynamically during a launch. In your case, if you can identify electric bounding boxes in advance, and build the lower trees in advance, then you’ll be able to use them at will. If this seems at all useful for you, we do have a sample for demand-loading geometry in the OptiX Toolkit called “DemandGeometry”, which is inside the “DemandLoading” sub-repo.

If instead you want to decide at any node (or every node) in the hierarchy whether to continue traversal, then in that case OptiX may not be the best choice, and it might be better to use a custom tree in CUDA. However, it is worth considering the possibility that dynamic software traversal might be slower than static hardware over-traversal of the scene. One option might be to let traversal do it’s thing and then decide after the fact whether to split or discard the ray. It all depends on what kind of algorithm and data you’re after, I’m not sure my suggestions here are helpful.

–
David.

Topic		Replies	Views
Extracting BVH from OptiX to manually traverse OptiX cuda	3	1331	November 28, 2022
BVH questions OptiX	6	2104	June 14, 2022
API for BVH Traversal on Turing GPUs CUDA Programming and Performance	20	5733	February 23, 2019
ray tracer choosing tools CUDA Programming and Performance	24	34171	May 20, 2008
How to use RT Core to accelerate software-implemented BVH where I need to access internal nodes? OptiX	1	49	April 2, 2026
Hardware Accelerated Bounding Volume Hierarchy CUDA Programming and Performance	1	340	October 5, 2022
A few questions about BVH traversal engine and triangle intersection engine OptiX kernel , data-science , architecture-and-design	1	1463	November 10, 2022
Does BVH construction use RT core? OptiX	1	278	July 12, 2024
Tracing Non-Homogenous Unstructured Volumes OptiX	4	709	November 6, 2023
Is there any performance difference implementing a ray-tracer in cuda vs. rendering pipelines? CUDA Programming and Performance	7	2982	March 2, 2019

Premature stop of BVH traversal?

Related topics