What is the best way to explicitly descend scene graph? (implement sampling of arbitrary mesh lights


I’m trying to implement full featured path tracer using OptiX.

Direct sampling of scene geometry which emitting light requires randomly selecting a primitive at first.

One way to achieve this is packing emitting primitives into a single buffer and selecting from them.
However this approach immediately becomes troublesome when moving light object (need to transform light primitives which are part of the buffer on host side) or when using emitting multiple instanced objects (need to pack massive light geometry).

Therefore, hierarchical approach seems to be practical as in rtTrace().
What I imagine is the following.

  1. call explicitly-descending function like rtVisit(topObject, payload). That function invokes a program for topObject like selector visit program.
  2. randomly select a child of a current node object in the program and recursively call rtVisit() for the selected child. accumulate transform if required.
  3. Finally the program is called for a GeometryInstance scope, here we can access vertex buffer and index buffer. select a primitive.
  4. sample from the selected primitive and apply the transform.

However, you know, there is not rtVisit().
A workaround might be creating scene graph by myself which matches OptiX-native scene graph.
(I think this is possible by exploiting bindless resources.)
But this is bit complicated and I need to maintain double scene graph (OptiX-native and this).

Is there another way smarter?

I think the better question topic would have been “How do I implement explicit sampling of arbitrary mesh lights?” and that’s what I’m going to explain.

Simply forget about traversing through the scene graph for that. Even if that could be done with Selector nodes, that wouldn’t be fast. Definitely don’t do that.

Let’s assume global illumination in a uni-directional path tracer with next event estimation.
The lighting evaluation happens for two conditions:

1.) Implicit light hits, means a sampled ray randomly hits an emissive surface.
The emission distribution function (EDF) evaluation for that case happens inside the closest hit program of that material. There could also be a BSDF in that material so the EDF is normally evaluated first and initializes the returned radiance.
Inside the closest hit program you have everything available which is needed to calculate the vertex attributes in world space coordinates.

2.) Explicit light sampling, and this is where it gets a little involved.
You would store all lights inside a buffer of structs. The fields in that struct would need to contain all information to uniformly (or importance-) sample a point on the light source.

You need to be able to calculate a light sample point in world space and evaluate the EDF for that.
If you want to transform and instance lights multiple times, then you need the following:

  • The object to world transformation.
  • If you want to support non-uniform scaling (not recommended, breaks the CDF over triangle areas!), you also need the inverse transpose matrix to properly transform the normals. (Note that tangents are not transformed with the inverse transpose. They are in the same plane as the geometry.)
  • The object coordinates of the geometry. This is the data in the attribute buffers assigned to the Geometry in your OptiX scene graph. You can access that explicitly via bindless buffer IDs. That is one reason why I’m putting all attributes into one buffer.
  • If your mesh is indexed, the indices defining the topology of your mesh. Same thing, use bindless buffer IDs.
    Means except for two integers this doesn’t need additional memory and no traversal to access any of the light geometries.
  • To be able to uniformly sample the light surface you need a cumulative distribution function (CDF) over the areas of the individual triangles.
    (If the EDF is not uniform over the area and static it would be possible to do importance sampling. If the EDF is not static (procedurally animated, a movie, etc,) forget about importance sampling.)
    That CDF also takes care to never sample triangles of zero area automatically.
    (If you want to support non-uniforma scaling, that pre-calculated CDF wouldn’t work, so I would highly recommend to use only uniform scaling, then you don’t need the inverse transpose matrix either.)

Now you would need to implement a function which explicitly samples an arbitrary mesh light.
You need four random numbers in the range [0.0f, 1.0f) for the following.

  • Let’s assume you sample one of many lights per next event estimation. That simply picks one light definition inside you buffers of light structs with the first random number.

Note that when picking one of many lights, the emitted radiance needs to multiplied by the inverse probability of the chance to sample that one light. That is, multiply the emission with the number of lights. Do not merge that probability into the light sample’s probability density function (PDF) result! That’d be incorrect in the final lighting calculation. (Find example code in my OptiX Introduction examples.)

  • The second random number is used to pick a triangle.
    This is done by sampling the CDF over the triangle areas. The resulting index is the triangle index in your index buffer accessible via that bindless buffer ID in the light definition from step one.
    (That CDF is only needed if the triangles are not all the same area.) Note that the light sample’s PDF also contains the ratio between triangle area and whole light area.

Now with that index you can load the vertex coordinates and other attributes from the attribute buffer via its buffer ID in the light definition.

Since you want to have transformed instances, you need to transform the object space coordinates into world coordinates.
Now you can calculate the effective area of that triangle which is needed to project it to solid angles. You also need the geometric normal to get the cosine theta for the foreshortening of the area.
Both available from the cross-product of two triangle edges.

Depending on the complexity of your light (EDF only on the front face, or on both, possibly with different EDFs on both sides, etc.) you calculate the cosine theta first and reject the light sample when looking at a back side. (Set the pdf to 0.0f.)

Now with the next two random numbers, you sample that triangle uniformly and get the barycentric coordinates.

With that you can interpolate all necessary attributes needed to evaluate the EDF. The code already exists in your intersection and the closest hit programs doing the EDF evaluation. Copy the minimal necessary parts into your explicit mesh light sampling function.

That function needs to return the light sample world position, direction from surface point to light, distance, emitted radiance, and pdf.

You would then need to check the visibility of the light sample and do your usual direct lighting calculations.

The OptiX Introduction examples I wrote contain a “LightDefinition” structure which is using bindless buffer IDs for CDFs for importance sampled HDR environment maps. The attribute and index buffers of mesh lights would need to be added in the same way.

To give you an impression of possible complexity, I implemented arbitrary triangle mesh lights in a renderer supporting the NVIDIA Material Definition Language (MDL) which supports four different EDFs, mixing of EDFs, different EDFs on the front and back face of thin-walled geometry, cutout-opacity(!), and any procedural evaluation of the emission color and intensity. The explicit light sampling for that extremely complex case alone is just 200 lines of CUDA C++ device code because of elegant usage of bindless callable programs (similar as for BSDFs).
For simpler material systems this should be even less.
The EDF evaluation happens in a per-material bindless callable program generated at runtime when loading the MDL material into the scene. No need to do it like that if you have only one kind of EDF (normally diffuse).

If the explicit light sampling function is implemented as bindless callable program, you can call it in either the the closest hit or ray generation programs. That comes in handy when calculating direct lighting for in-scattering of sub-surface scattering materials or more deferred light transport algorithms.

Some words of warning:

  • Since the triangle area is used in the denominator when calculating the light sample PDF, you need to make sure that the floating point precision isn’t exceeded for very tiny triangles. The CDF makes sure to not sample zero area triangles.
  • If the distance between the light and the surface point gets too small, the normalization of the direction vector fails.
  • Depending on the mesh topology (e.g. closed concave mesh) more than half of the explicit samples will be invalid.
  • Mind that the default triangle intersection routine is not watertight. When putting EDFs on the inside of geometry only you will get light leaking between triangles. (Well, that’s a rather esoteric case.)

Thank you for detailed reply!

I understand that I can avoid to create a massive light geometry buffer even for instanced lights by using bindless buffers and avoid to transform in host-side by applying transform in device code.
(For instanced lights, there are multiple "LightDefinition"s that refers the same geometry buffer, right?)

However I still have a concern.
With the approach you described, do you mean that I need to manually flatten the scene graph to make the LightDefinition buffer and manually resolve hierarchical transform then assign the resolved transform to each LightDefinition?

If we assume the scene graph in the attached image (where the GeometryInstance contains emitting primitives), does the result become flattend 4 LightDefinitions?

First of all, the scene graph in your image is not possible in OptiX.
Transform nodes can only have Group, Selector, GeometryGroup, and Transform as child.

Yes, you need one for each concatenated transformation along the path from scene root to light geometry.

Basically yes, on application side you need to calculate the concatenated transformation to each of your Geometry leaves which contain a light and put that into the buffer (array of structs) of LightDefinitions. The size of that array defines the number of lights in your scene.

The OptiX scene graph itself remains unchanged, because that handles the implicit light hits in the closest hit program, which can then also do the direct lighting by calling the explicit light sampling function for each light and type.

In general, if your application allows interactive changes to the scene data, it needs to be able to access and manipulate the data in its own scene graph as needed. So traversing that to build the concatenated transforms needs to be possible.
That application side scene graph can be as articulated as you like, for example when handling a kinematic hierarchy for animated skeletons, etc.
Though for best rendering(!) performance, and that is true for both rasterization and ray tracing, the renderer-side scene graph should be as shallow as possible.
Means it’s actually recommended to flatten the renderer scene graph to a two level hierarchy at maximum, means not more than two Acceleration nodes on the path from scene root to any Geometry and effectively only a single level of transforms is needed in the end, which is also the transformation for the mesh lights.

(Our own nvpro-pipeline which, at its core, is scene graph and effects framework with an OpenGL renderer backend, goes to great lengths to optimize the renderer data that way. It also uses some fast methods to speed up the validation and updates of the host scene graph. You’ll find its source code and many presentations on github.com as well.)

I’m explaining some fundamental OptiX scene graph layouts in my GTC 2018 talk which you can find when following the links in the sticky post.

Ah, right. Changing the GeometryGroup in the middle to Group and inserting new GeometryGroup above the GeometryInstance makes it what I wanted to express.

Group A
/ . . .
/ . . . . .
/ . . . . . . .
Trans A . . . . Trans B
\ . . . . . . ./
\ . . . . ./
\ . . ./
Group B
/ . . .
/ . . . . .
/ . . . . . . .
Trans C . . . . Trans D
\ . . . . . . ./
\ . . . . ./
\ . . ./
/ . . .
/ . . . . .
/ . . . . . . .
Geometry . . . . Material

Indeed, I saw a similar advice on the programming guide.

Therefore, does this suggest that we should avoid multi-level instancing even if it is possible?
(If multi-level instanced lights is desired, I guess system level support like rtVisit() I wrote in the original post seems to be useful at least from the perspective of productivity … forget this, haha :) )

I will forget multi-level instancing for the time being and will try to implement the approach you explained for me.


Yup, do what you need to do on the application side, but for optimal rendering performance deep hierarchies will simply slow down the BVH traversal, as would explicit visit decisions.

This should be faster:

                      Group -> Acceleration
         /          /            \            \
Transform AC  Transform AD  Transform BC  Transform BD
         \          \            /            /
                   GeometryGroup -> Acceleration
                 GeometryInstance -> Material

By the way, if I use instanced lights (they share the same geometry) and I’d like to implement multiple importance sampling, the closest hit program needs to distinguish the instanced lights when a ray hit a light (implicit light sampling calculation).

Is there a way to distinguish instanced geometries in a closest hit program?


Unfortunately no. An instance ID is a long standing feature request in OptiX but didn’t happen, yet.

It’s sad.
I like OptiX but the current one seems to lack functionality related to NEE (including MIS), I hope the next version will enhance these:)

My current solution for that is to not instance the whole subtree, but to share the geometry and acceleration structure only.
Slide 9 in this presentation: [url]http://on-demand.gputechconf.com/gtc/2018/presentation/s8518-an-introduction-to-optix.pdf[/url]

Thanks, the workaround you showed seems a good alternative.
I’ll consider if this workaround can be integrated into my renderer system!

I wonder if it is beneficial (in contrast to your workaround) for what described in the figure 3.4 in the programming guide (ver. 5.1.0).
I think this becomes simpler by simply using single geometry group and make it child of both transforms.
Is this just an example of sharing acceleration structure?

That is just another example of sharing acceleration structures among GeometryGroups.
In this case both GeometryGroups still contain exactly the same geometry, just with a different hierarchy.
My diagrams show just the minimal scene graph case.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.