Question about Instance Acceleartion Struction

aister91 · February 23, 2024, 11:58pm

Hi there, I am confused about IAS.
I want two batching models for the scene, that model has the same geometry.

If I build the IAS to this:

Instance 0: no transformed instance of GAS 0
Instance 1: transformed Instance of GAS 0
Instance 2: Instance 0 + Instance 1

In this case, I have three traversable instances.

Then I render three images using three trace operations.

for i of instances.size():
Trace(traversable[i])

Is it possible to implement using one GAS?
or are there any similar examples?

droettger · February 26, 2024, 8:54am

I’m not sure what “batching models” means in your description and why you would need to do it this way.
If you need to render model number 0, then model 1, and then both models, I assume your Trace() function does calls to all optixTrace functions for each individual model (traversable handle) before calling all optixTrace for the next? That is, you’re not shooting the same ray into three different scene setups?

There are multiple ways to implement that with OptiX. Here are three options:

Inside the optixTrace call you define what your scene root is with the traversableHandle argument.
That defines your world space and all elements which can be reached during the acceleration structure (AS).
Means if you want to shoot rays into different scenes, using the same pipeline programs and shader binding table entries, you would just need to have different traversableHandles you can use as argument to optixTrace.

You would normally not build an instance AS and then only trace against the individual OptixInstances inside that, which wasn’t even possibly inside previous OptiX versions at all.

The fastest traversable performance on RTX boards is the IAS->GAS scene structure (means OptixPipelineCompileOptions traversableGraphFlags = OPTIX_TRAVERSABLE_GRAPH_FLAG_ALLOW_SINGLE_LEVEL_INSTANCING

Following your idea you could implement the following:
traversabeHandle[0]: Build an IAS with a single OptixInstance with identity matrix and GAS0 as child.
traversabeHandle[1]: Build an IAS with a single OptixInstance with some transform matrix and GAS0 as child.
traversableHandle[2]: Build an IAS with two OptixInstances, one with the identity matrix and GAS0, and the other with the transform matrix and GAS0.
(If your Instance 2 above actually has another transform on top, merge that into the two OptixInstance transforms.)

Each of these scenes is identified by a top-level traversable handle which has an IAS->GAS structure for optimal performance!
Now place these three traversable handles into your launch parameter block and then you can do your loop inside the ray generation program over these three traversable handles and handle each after the other.

This would work with arbitrarily many traversableHandles when you place them into a buffer and put just the pointer and size into the launch parameters which are limited to 64kB constant memory.

A less complicated scene setup would be the following using OptixVisibilityMask:
The OptixInstance structure contains a visibilityMask field which is unsigned int but is only an 8 bit value.
The optixTrace function also has an OptixVisibilityMask argument and if the AND operation of these two values is not zero, the instance is traversed. Means this works like a DIP switch with 8 switches.

So if you only build the third IAS->GAS structure containing two OptixInstance entries referencing both GAS0, and then set the visibilityMask of the instance holding GAS0 to 1 (bit 0 is set) and the other to 2 (bit 1 is set), then you can select which of three cases you want to traverse simply by setting the visibilityMask inside the optixTrace call to 1 for the untronsformed instance with GAS0, to 2 for the instance with transform over GAS0, and to 3 for both instances.
(In case you’re tracing one ray into the three different scene setups, this would be the better method.)

Building instance AS is very fast and they also do not require a lot of memory, and esp. not in your example case, so the idea with the three top-level traversable handles is probably the fastest method.
The one with the OptixVisibilityMask is more elegant.
But if I understood the idea correctly, I would first try to simply launch three times with three different traversable handles.
Example code for updating the launch parameters asynchronously here (that code is just setting the sub-frame iteration index for each optixLaunch in the benchmark mode):
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/MDL_renderer/src/Device.cpp#L1880

Related threads with caveats about not tracing against the top-level traversable handle inside a render graph:
https://forums.developer.nvidia.com/t/traverse-a-bvh-from-a-specific-node/262208
https://forums.developer.nvidia.com/t/how-to-use-optix-trace-a-child-instance/272378

aister91 · February 26, 2024, 2:34pm

Thank you for your response.

Indeed, as you guessed, I am looking to render three different scenes simultaneously. In this process, as I expressed, the scenario involves displaying two models that share a single model’s information on the screen, In my work, I have been copying the primitive information each time in my approach.

What I wanted to inquire about is whether, when using OptixInstance, referencing the same GAS allows for only one model to be used in the actual GPU memory. Despite searching through existing tutorials, they seemed to imply the creation of new primitives for each instance. For reference, see: OptiX_Apps/apps/intro_motion_blur/src/Application.cpp at master · NVIDIA/OptiX_Apps · GitHub

Therefore, as you wrote in the example, I created three primitive scenes and used three handles for the GAS, which has memory overhead.

If I understand correctly, I can experiment with the idea you explained in a single Launch.

Instance 0: a single OptixInstance with a visible mask of 0, an identity matrix, and GAS0 as a child.
Instance 1: a single OptixInstance with a visible mask of 1, some transformation matrix, and GAS0 as a child.
IAS: combine instances 0 and 1.

Since the same set of rays is used, one Launch would call Trace three times:

Trace(…, visible mask(1), …)
Trace(…, visible mask(2), …)
Trace(…, visible mask(255), …)

Do I understand this correctly?

In this case, how would the instance ID and SBT operate? That is, do the instances need to be configured to have the same SBT offset?

droettger · February 26, 2024, 5:51pm

In my work, I have been copying the primitive information each time in my approach.

That shouldn’t be necessary if you can instance one GAS multiple times using instancing.

What I wanted to inquire about is whether, when using OptixInstance, referencing the same GAS allows for only one model to be used in the actual GPU memory.

Exactly. You described that in your scene setup already when using the GAS0 multiple times under different instances.
That’s the beauty of raytracing. You can instance geometry a lot that way and have numbers of triangles on the screen which you wouldn’t be able to render in a reasonable time with a rasterizer without serious culling tricks.

Despite searching through existing tutorials, they seemed to imply the creation of new primitives for each instance.

I’m creating new GAS inside the intro examples to make them very simple. They could easily use the same GAS under different instances.
For example, the two createSphere() calls with the same arguments could also use just the first sphere two times:
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/intro_runtime/src/Application.cpp#L1642

My more elaborate examples build a unique key per GAS to be able to instance the geometry automatically.
E.g. search for keyGeometry in here:
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/MDL_renderer/src/Application.cpp
The simply host scene graph I’m using takes care of the OptixInstance building later.
I’m assigning the material at the instance level, so the same geometry can be used with different materials.

The ASSIMP mesh loader routine can reuse whole models as well. It references the scene graph root node. Due to the flattening of the scene graph to an IAS->GAS, the individual geometry inside the model will result in separate OptixInstances. To instance the whole thing, there would need to be a multi-level render graph (IAS->IAS->GAS) and some changes to the material assignments.
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/MDL_renderer/src/Assimp.cpp#L56

Since the same set of rays is used, one Launch would call Trace three times:

Yes, for that you need only the single GAS and one IAS with two OptixInstances, one with the identity matrix and one with the transformation matrix placing the GAS at two different positions inside your scene.
Mind that you then need to transform your object space vertex attributes into the world space.
With a two level IAS->GAS that is very simple and I wrote my own specialized transform routines for that fastest scene layout.
Search the example code where these are used:
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/MDL_renderer/shaders/transform.h

The intro_motion_blur example is using the general purpose transformation helper functions provided by the OptiX SDK. These are unnecessarily complex for the IAS->GAS structure which has only exactly one transform list entry.

Read the README.md in that example repository and look at the rtigo10 and rtigo12 examples for the fastest and smallest SBT implementation.

The OptixInstance contains an sbtOffset field which allows to select a specific hit record start index inside the SBT.
The user defined instanceId field can be used to access arbitrary data per instance. That is how my later examples access the vertex attributes, indices, and material and light parameters.

That is, do the instances need to be configured to have the same SBT offset?

That depends on what you want your device programs to be per instance. If they should use the same hit record, just set the sbtOffset to zero. You only need to be able to access your vertex attributes and indices somehow,

The optixTrace “SBT” arguments allow to select different SBT records in addition to implement different ray types.
The effective SBT index is calculated with the formula in this OptiX Programming Chapter:
https://raytracing-docs.nvidia.com/optix8/guide/index.html#shader_binding_table#accelstruct-sbt

Some threads about render graphs and SBTs explaining various possible SBT layouts:
https://forums.developer.nvidia.com/t/passing-per-vertex-attribute-data-into-a-shader-program/279321
https://forums.developer.nvidia.com/t/sbt-theoretical-quesions/179309
https://forums.developer.nvidia.com/t/creating-multiple-pipelines-with-different-raygen/202826/2
https://forums.developer.nvidia.com/t/question-about-sbts/158357/2
https://forums.developer.nvidia.com/t/optixwhitted-how-to-insert-scene-from-opennurbs-on-mesh/278134/5
https://forums.developer.nvidia.com/t/index-each-gltf-imported-triangle-with-a-unique-primitiveid/111093/8

system · March 11, 2024, 5:52pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.