Optix PathTracer: how to implement an updateGeometry functionality

Hi,

I am trying to implement a test function that adds a new geometry, like a new box in the scene of OptixPathTracer. I would like to leave apart completely the user-system interaction part and focus on the Optix part. Let’s simply assume that by clicking on a button I can change/increase the size of const static std::array<Vertex, TRIANGLE_COUNT* 3> g_vertices and all the related Scene data arrays and variables.

Now in Optix side, I suppose the right place to implement such a functionality is the updateState, which already handles the camera update and resize events:

void updateState( sutil::CUDAOutputBuffer<uchar4>& output_buffer, Params& params )
{
    // Update params on device
    if( camera_changed || resize_dirty )
        params.subframe_index = 0;

    handleCameraUpdate( params );
    handleResize( output_buffer, params );
}

After pressing a button I have a new box in the Scene data and I would like to render it.
I have two basic questions on this:

  1. Which part of the Optix pipeline should be rerun in the updateState to make the Optix know that I have changed the Scene data?
  2. Is is possible to save some performance time and just give the extra (new) vectors instead of copying the entire new g_vertices in to the device?

Thanks!

First, when you want to update the render graph (Instance and Geometry Acceleration Structures (IAS, GAS)) inside an OptiX application you need to call optixAccelBuild for all ASes which changed.

I don’t like explaining that for the optixPathTracer example inside the SDK because that is only using a hardcoded geometry acceleration structure which isn’t really what you want when adding objects to a scene.

The better render graph structure for that is a two-level hierarchy with a top-level IAS with OptixInstances which reference GAS. That will also immediately allow geometry reuse by instancing.

Acceleration structure rebuilds or updates need to happen as follows:

  1. When you change the topology of the geometry, that is, changing the number of the vertices or changing any of the indices in an indexed mesh, you must rebuild that GAS.
    That will produce a new traversable handle which you need to put into the OptixInstance above it.

  2. When you only change the vertex positions inside a geometry (like when morphing a mesh), you also must call optixAccelBuild but it can be either a rebuild or an update operation.
    Updating (“refitting”) AS requires that the initial AS was built with the OPTIX_BUILD_FLAG_ALLOW_UPDATE flag.
    https://raytracing-docs.nvidia.com/optix8/guide/index.html#acceleration_structures#dynamic-updates
    But there can be cases where updates degrade the performance of the AS traversal and a full rebuild would improve that again.
    Explained here: https://forums.developer.nvidia.com/t/updating-as-leads-to-extremely-low-traversal-performance/267416

  3. When any GAS inside the render graph changes, all IAS inside the graph which can reach the changed GAS must be rebuilt or updated with optixAccelBuild as well.
    That is necessary because any change to the AABBs of the lower AS must be reflected inside the BVH.
    With a two-level IAS->GAS structure that means you must call optixAccelBuild on the top-level IAS everytime anything changes inside the GASes attached to it. Building IAS is very quick.

  4. When changing transform matrices inside an OptixInstance inside an IAS, the IAS must be rebuilt or updated.
    All IAS above that changed IAS must also be rebuilt or updated since the AABBs changed.

  5. When the number of OptixInstance inside an IAS changes, that is, when adding or removing instances, you must rebuild the whole IAS. Update won’t work because the scene topology changed.
    All IAS above that changed IAS must also be rebuilt or updated since the AABBs changed.

  6. When using motion transforms (linear or SRT) inside the render graph and these change transforms, all IAS above them must be rebuilt or updated. Usually update is fine because only the AABBs of the children changed, though with motion AS that can become more involved.
    Shown in this example: https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/intro_motion_blur/src/Application.cpp#L2490

Note that there is also the optixDynamicGeometry example inside the OptiX SDK which shows this.

Now with all that said, if you still want to use the optixPathTracer as basis for your experiments, updating the GAS inside that would require code changes in various places, because all the geometry and the material assignments to the individual primitives are hardcoded and only work together.

Means you wouldn’t only need to change the g_vertices array but also the g_mat_indices. These should all become non-const and put into std::vectors when you plan to update them.
If you want to add more materials dynamically, everything based on the MAT_COUNT would also need to be made dynamic, that includes the SBT.

Then you would need to call buildMeshAccel( state ); with the updated data. But that function is not meant to be called more than once inside the current code and must be changed to allow dynamic memory allocations, means it needs to be able to free and alloc the state.d_vertices and state.d_gas_output_buffer each time it’s called and not only at program shutdown.

Note that the program is limited to the four hardcocded materials MAT_COUNT but that means you wouldn’t need to change the SBT size, but you must update the hitgroup_records additional data. Search the code for data.vertices.
That happens inside createSBT() which again is not meant to be called more than once inside that application. You would again need to change that or build an update version of that function which handles dynamic free and alloc changes and updates every time you change the geometry data.

I would still recommend implementing an IAS->GAS scene structure where you can more easily update the scene by adding or removing complete GAS. If each of them has a single SBT record, updating the SBT accordingly isn’t difficult either and the IAS sbtOffset is basically selecting the material shader with this formula: https://raytracing-docs.nvidia.com/optix8/guide/index.html#shader_binding_table#accelstruct-sbt

Related thread with AS and SBT tips and tricks: https://forums.developer.nvidia.com/t/question-about-instance-acceleartion-struction/283898/4

Thank you very much for your response, Optix instancing is definitely my next stop! For the moment, I am trying to understand some basic optix pipeline concepts by tweaking the path tracer sample.
So, let’s say that we have the main rendering loop:

while (true)
{
   updateState(output_buffer, state.params);
   launchSubframe(output_buffer, state);
   //display functions
}

Now, in order to add a box of the same material I have implemented the appropriate functions to create the required vectors ( g_vertices, g_vertices, etc) . I know these vectors are set up properly because when I run the geometry altogether from the beginning I can see it rendered. The problem is how to add the boxes one by one.
I tried to follow the guidelines you provided but I haven’t managed something other than exception errors :)

while (true)
{
			if (_scene->geometry_changed) {
				CUDA_CHECK(cudaFree(reinterpret_cast<void*>(state.d_vertices)));
				CUDA_CHECK(cudaFree(reinterpret_cast<void*>(state.d_gas_output_buffer)));
				buildMeshAccel(state);

				CUDA_CHECK(cudaFree(reinterpret_cast<void*>(state.d_vertices)));
				createSBT(state);
                }
   		   launchSubframe(output_buffer, state);
          //display functions
}

I know that all this looks a bit vain as path tracer is not the actual sample that demonstrates the dynamic change of geometry, but somehow I think that I need to get a better idea of how all the parts of the pipeline combine together to render the geometry before moving to more advanced samples.

Thanks :)

Please read what I described above again.

You must not free the state.d_vertices before the createSBT() because that function stores the device pointer to the vertices to all hit record data entries. Look for: hitgroup_records[sbt_idx].data.vertices = reinterpret_cast<float4*>( state.d_vertices );

That pointer is required inside the device code to calculate the vertex attributes.
Look what the optixPathTracer.cu does after this line: HitGroupData* rt_data = (HitGroupData*)optixGetSbtDataPointer();

Means your code will crash with a CUDA invalid memory access error.

Again, you should change the buildMesh() function itself to allow calling it more than once. For that you should place your two cudaFree() calls on state.d_vertices and state.d_gas_output_buffer into that function and implement it to handle vertex updates robustly. Do not sprinkle code with side-effects inside your application code.

Then I explained that you should either change the createSBT() function in a way that it can be called more than once (or implement a variant of that which just updates the necessary data.)
If you look at the createSBT() code, all the cudaMalloc() calls inside there must not leak memory when calling it more than once. It’s your responsibility to take care of that.

Also whenever you change the scene data (geometry, material colors, etc.) you must restart the progressive accumulation of the path tracer.
Inside the optixPathTracer that is done by resetting the launch parameter value subframe_index to zero which happens in updateState() so that if-statement needs to handle your _scene->geometry_changed condition as well and that function needs to be called or you’ll get a blurry mess.

void updateState( sutil::CUDAOutputBuffer<uchar4>& output_buffer, Params& params )
{
    // Update params on device
    if( camera_changed || resize_dirty )
        params.subframe_index = 0;

Please try the attached optixPathTracer.cpp.
That version will render an empty Cornell Box and each time you hit key A, it will add a random triangle with one of the three materials (white, red, green) to the scene.

This is the least amount of changes which works for dynamic geometry updates in that example.
It’s not fully optimized because the createSBT() routine does too much when called more than once.

Just diff the source code with the original OptiX SDK 8.0.0 version and you’ll see the changes.
I also commented out some unused things, replaced the Vertex struct with a float4 type, and changed the light color.
optixPathTracer.cpp (46.3 KB)

1 Like

Thank you really much, I am really glad that I have all this now. I think I have to work a lot more to get into the optix pipeline, I am definitely not ready for optix instancing yet :)
I will start comparing the optixPathTracer.cpp you attached with the original one from Optix SDK 8.0 and I will get back here if some spots are still unclear (mainly for the part of taking care of all memory leaks).

I tried the update geometry function you sent and it works as you described. While I was tweaking the code I found another part that I would like to ask about.
What if instead one triangle each time I press the “a” button I create a lot of triangles though a slow for loop:

// Every time this function is called a single triangle with a random number is added-
void updateGeometry(PathTracerState& state)
{
    for (int i = 0; i < 10000; i++) {
    // Translate the triangle inside the Cornell Box.
    float tx = float(rand()) / float(RAND_MAX);
    float ty = float(rand()) / float(RAND_MAX);
    float tz = float(rand()) / float(RAND_MAX);

    // The Cornell Box extents are [0, 0, 0] x [556.0, 548.8, 559.2], Light at y = 548.6
    tx = tx * (556.0f - COORDINATE_SCALE);
    ty = ty * (548.0f - COORDINATE_SCALE);
    tz = tz * (559.2f - COORDINATE_SCALE);

    float4 translation = make_float4(tx, ty, tz, 0.0f);

    float4 v0 = getRandomVertex() + translation;
    float4 v1 = getRandomVertex() + translation;
    float4 v2 = getRandomVertex() + translation;

    state.vertices.push_back(v0);
    state.vertices.push_back(v1);
    state.vertices.push_back(v2);

    uint32_t materialIndex = rand() % 3; // 0 == white, 1 = green, 2 = red.

    state.mat_indices.push_back(materialIndex);

    }
}

Running this code the geometry calculated in theCPU does not manage to be updated consistently in the GPU. In this case, what’s is the most common practice to update the renderer geometry? Should I implement a mute/cv in buildMeshAccel?
Thanks!

Running this code the geometry calculated in the CPU does not manage to be updated consistently in the GPU.

What do you mean with “does not manage to be updated consistently in the GPU”?

Your code change works just fine and nothing else needs to be changed.
(In my example code you might want to make the triangles a little smaller to not completely fill up the Cornell Box with 10000 big triangles each time you hit “A”. Try #define COORDINATE_SCALE 5.0f instead.)

There cannot be synchronization issues when updating the host side geometry in a single threaded application and as long as the optixAccelBuild is using the same CUDA stream as the optixLaunch, the AS rebuilds cannot happen while the renderer is using the previous data. Also the cudaMalloc and cudaFree calls are synchonous.

If you’re concerned about the host speed, there are simple things like reserving the right amount of space inside the two vectors to make the push_back() faster.

If the geometry can be generated on the GPU (like from some CUDA simulation result) it would of course be faster to keep the buffers on the GPU.

If you mean it’s not taking the same time to update the GAS each time you added more geometry, that is obviously to be expected. The bigger the GAS, the more time it takes to build.

(Though it’s not really slow, I tested that on my RTX 6000 Ada adding 1 MTris and it gets a little slower each time, but with 10 million triangles the whole box is black already because it’s cramped full with triangles and no lighting gets out.)

That code is just an example. I was assuming you’re replacing that with some code which adds real model geometry like cubes etc. or something loaded from a model file.
When building a scene with different models, it would be faster to build a GAS per model and then add that under a top-level instance acceleration structure (IAS) as explained before.
If you’re building something with lots if individual geometric primitives, it would also make sense to split the primitives into reasonably sized individual GAS (e.g. like 10,000 to 1,000,000 primitives per GAS) and add them to a top-level IAS.

You did benchmark this in full release mode builds?
OptiX SDK examples are translated with debug device code in debug targets and are really slow.
https://forums.developer.nvidia.com/t/a-problem-when-i-want-to-createmodule/276228/2

Please always provide the following system configuration information when asking about OptiX issues:
OS version, installed GPU(s), VRAM amount, display driver version, OptiX major.minor.micro version, CUDA toolkit version used to generate the module input (PTX or OptiX-IR?), host compiler version.