Optix PathTracer: how to implement an updateGeometry functionality

droettger · March 18, 2024, 8:33am

Running this code the geometry calculated in the CPU does not manage to be updated consistently in the GPU.

What do you mean with “does not manage to be updated consistently in the GPU”?

Your code change works just fine and nothing else needs to be changed.
(In my example code you might want to make the triangles a little smaller to not completely fill up the Cornell Box with 10000 big triangles each time you hit “A”. Try #define COORDINATE_SCALE 5.0f instead.)

There cannot be synchronization issues when updating the host side geometry in a single threaded application and as long as the optixAccelBuild is using the same CUDA stream as the optixLaunch, the AS rebuilds cannot happen while the renderer is using the previous data. Also the cudaMalloc and cudaFree calls are synchonous.

If you’re concerned about the host speed, there are simple things like reserving the right amount of space inside the two vectors to make the push_back() faster.

If the geometry can be generated on the GPU (like from some CUDA simulation result) it would of course be faster to keep the buffers on the GPU.

If you mean it’s not taking the same time to update the GAS each time you added more geometry, that is obviously to be expected. The bigger the GAS, the more time it takes to build.

(Though it’s not really slow, I tested that on my RTX 6000 Ada adding 1 MTris and it gets a little slower each time, but with 10 million triangles the whole box is black already because it’s cramped full with triangles and no lighting gets out.)

That code is just an example. I was assuming you’re replacing that with some code which adds real model geometry like cubes etc. or something loaded from a model file.
When building a scene with different models, it would be faster to build a GAS per model and then add that under a top-level instance acceleration structure (IAS) as explained before.
If you’re building something with lots if individual geometric primitives, it would also make sense to split the primitives into reasonably sized individual GAS (e.g. like 10,000 to 1,000,000 primitives per GAS) and add them to a top-level IAS.

You did benchmark this in full release mode builds?
OptiX SDK examples are translated with debug device code in debug targets and are really slow.
https://forums.developer.nvidia.com/t/a-problem-when-i-want-to-createmodule/276228/2

Please always provide the following system configuration information when asking about OptiX issues:
OS version, installed GPU(s), VRAM amount, display driver version, OptiX major.minor.micro version, CUDA toolkit version used to generate the module input (PTX or OptiX-IR?), host compiler version.