Tracing Non-Homogenous Unstructured Volumes

Several years ago (pre-optix) I wrote a cuda-based BVH algorithm that traces rays through non-homogenous unstructured volumes used to compute physically accurate line-of-sight radiance. These volumes are generated by CFD solvers that provide temperature, pressure, and molecular specie concentrations that vary throughout the volume (a flowfield actually). The CFD volumes are comprised of hexahedrons, pyramids, prisms, … (there may be millions of these elements). The best way I found to represent these types of volumes was to decompose all volume elements into triangles resulting in what I call a large “polygon soup”.

To compute the radiance associated with a particular ray, the BVH algorithm is used to find “all” triangles intersected by the ray. These intersections are not necessarily depth sorted (a requirement for the radiance transport algorithm) so a separate step is used to do that sorting.

Sorry for the long winded introduction here but my question is Optix suitable for this type of application? I have been experimenting with Optix and can use a closest-hit shader to quickly find the entry point of the ray into the volume. However, it seems like it would be very inefficient to continually calling optixTrace to walk a ray through the volume looking for the next closest hit. I would really like to use Optix for this application since its BVH is very fast and leverages RTX cores (something my current solution can’t do). Any guidance or alternative suggestions would be greatly appreciated.

Hi @crow, good question. Sounds like a fun & interesting ray tracing application to me.

Does your volume have any co-planar triangles, and do you need to guarantee that you process all the hits along a ray, including potentially multiple hits for coindicent surfaces at the same t value? I could imagine if you’re traversing irregular cells and need to be guaranteed to get both the enter & exit hits, the need to handle coincident surfaces might be on the table. If so, then you’ll want to use an anyhit program, instead of closest-hit, and a sorting process similar to what you described with your CUDA project. Otherwise, you may have the option to use a closest-hit program while re-casting the ray after every hit, in order to avoid the sorting process.

Anyhit will give you all the hits along a single ray, and not necessarily in sorted order. (Note in OptiX a single primitive is allowed to call anyhit more than once by default. There is an OptiX enum to prevent multiple anyhit calls, if that’s something you need.) The usual difficulty with this approach is that you need to have enough memory already allocated to store the results, and it’s preferable to have a simple static indexing scheme for storing the data vs trying to manage anything dynamic. Presumably you’ve already solved those problems in your previous project, but the performance snag is that saving results to memory and sorting is quite expensive, and possibly more expensive than the ray tracing phase.

Another approach is to do what you’ve started trying: use a closest-hit program, and shoot a new ray after every hit. (Note you may be aware, but just in case: make sure to shoot the new ray in your raygen program after the optixTrace() call returns, and not in your closest hit program, otherwise you’ll have a recursion and use a lot of stack space unnecessarily.) I would think it’s not necessarily slower or less efficient to keep calling optixTrace(), compared to capturing out-of-order hits and sorting them. Using closest-hit is probably going to overall be generally easier and less code. In fact I might recommend starting with the closest-hit & casting new rays approach and see if perf is acceptable, unless you need to guarantee traversal through every primitive along the ray.

With closest hit, you’ll only be able to get one hit for any given t value, and the hit order for any coindicent surfaces is undefined. On top of that you might have to do something to avoid hitting the same primitive twice, or missing the next primitive when it’s very close, or both. The first thing I would try is to always re-cast the exact same ray, e.g., pass the original ray origin, and do not use the previous hit location as the new origin, instead advance the tmin value for the next ray to be the t value of the current hit plus the smallest possible delta (like maybe interpret the t bits as int, add one, then go back to float). I don’t know if this is guaranteed to work, but I’d expect that it will. Otherwise you could look into using a more general Self Intersection Avoidance solution.

OptiX is a decent choice for this problem when you’re traversing through polygon soup. Just for completeness, I should mention that one question to consider is whether you might want or need to handle sampled or voxelized volume data, because OptiX is not really designed for that, and there are useful volume rendering algorithms for structured grids that perform better without involving a BVH.


Thank you very much David for your reply. It was very helpful.

As you suggested, I created a loop around optixTrace() and keep incrementing tmin based on the prior t value (+ epsilon). This loop was executed until the missShader() was called. This worked very well to march rays through the volume and was easily 3X faster than my home-grown BVH solution.

The CFD solvers usually implement adaptive gridding algorithms to cluster volume elements in regions with large temperature and pressure gradients. So as rays are cast through these volumes, they will automatically cluster the ray step size to the regions with high gradients (there are more triangles located at these points). I don’t necessarily need to intersect every triangle along the ray’s path since CFD solutions tend to overgrid their results in order to maintain computational stability.

For co-planar triangles, yes, most of the triangles are derived from quads since the flowfield volume elements tend to have 4-sided co-planar faces. My BVH solution worked with quads which saved an entire level in the BVH tree. Are there any optimizations in Optix that could take advantage of quads rather than triangles?

For coindicent triangles, that shouldn’t be a problem since the dt between them should be zero and can be skipped in the radiance transport integration algorithm.

I will do some experiments with using any-hit programs to get a list of all triangle intersections a ray may encounter. But you are exactly right that that involves alot more logic and resources to store and sort the results. Using my BVH algorithm though, using an any-hit approach was about 2x faster than a stepping approach since there were alot of rendundent BVH node and leaf intersections being done. So another question to you is does Optix’s BVH algorithm always start from the root-node when calling optixTrace or can it start lower in the tree given a prior leaf intersection?

Lastly, I really do appreciate your response. I am also looking at voxelizing the CFD grids I am working with and starting to experiment with NanoVDB to trace rays through these structured volumes.

Thanks again,


1 Like

So another question to you is does Optix’s BVH algorithm always start from the root-node when calling optixTrace or can it start lower in the tree given a prior leaf intersection?

OptiX starts the BVH traversal at the traversable handle you give to the optixTrace() call.

See this discussion about potential issues with that idea:

Note that when using an OPTIX_TRAVERSABLE_GRAPH_FLAG_ALLOW_SINGLE_LEVEL_INSTANCING scene hierarchy, starting traversal from anything other than the top-level IAS will not gain anything on RTX boards, since that path is fully hardware accelerated by the RT cores.

I think you need to switch the OptixTraversableGraphFlags to OPTIX_TRAVERSABLE_GRAPH_FLAG_ALLOW_ANY to be able to trace against other AS than that top-level IAS. (Not tried myself.)
You need that flag anyways if your scene description is more complex than IAS->GAS.


Are there any optimizations in Optix that could take advantage of quads rather than triangles?

I think there are some mild size optimizations for quads, and there are some perf optimizations in Ada hardware that help quads. This is an area of active development, so I expect it to continue to improve.

You can elect to use a custom intersection program that knows how to intersect one or more quads at a time, and that has the potential to save a lot of memory space, but the tradeoff cost is performance - it’ll likely be quite a bit slower than using triangulated quads with the built-in hardware-accelerated triangle intersector.