How to handle multiple ray generators

Ladies and gentlemen:

I have a few different approaches to generate rays. I can implement it with different ways from OptiX 7:

Option 1: a ray gen function for each approach.
Option 2: only one ray gen function and a continues callable function to handle different approaches.
Option 3: only one ray gen function, and a ugly switch case to handle different approaches.
Option 4: a pipeline for each approach (I would prefer not go this direction)

Would there be any run-time performance differences among these four approaches?

  • If yes, which one is usually fastest and how much it would be?
  • I am a lazy guy and really want to go to Option 3. Only one approach is used at a time, no dynamic branching, so it should have no performance penalty right?

Thanks,

X.

It would help to know roughly what these different ray generation programs are doing.

Option 1: a ray gen function for each approach.
That would be in one pipeline with as many shader binding tables (SBT) as you have raygen programs.
Depending on what the raygen programs do, the shader binding table might be completely identical except for the raygenRecord pointer.
The pipeline compilation would need to take all programs in the pipeline into account. Means the resources like number of registers would be defined by the biggest requirement. If the different raygen programs use vastly different amounts of registers, it might be faster to not have them all in the same pipeline.

Option 2: only one ray gen function and a continues callable function to handle different approaches.
Continuation callables add quite some resource management overhead, similar to any other program which can call trace. When possible you should avoid them for performance reasons.

Option 3: only one ray gen function, and a ugly switch case to handle different approaches.
That can actually be the faster solution if you can share as much code as possible and strive to minimize live variables. Comments on register pressure from above apply as well.
It also depends greatly on how many optixTrace calls you have in your code. The goal is to have as few as possible!
(For example in my simpler unidirectional path tracers, I have only two optixTrace calls in the whole renderer code, one for the radiance ray shot in the raygen program and one in the single(!) closest hit program for the visibility ray.)

Option 4: a pipeline for each approach (I would prefer not go this direction)
That could actually be beneficial if the resource usage is very different. See Option 1.

Option 5: Using a single ray generation program and direct callable programs to calculate the data required for the following optixTrace calls.
That also adds calling overhead and register pressure but direct callables are my favorite thing to reduce the required device code to a minimum.
Scroll to the bottom of this page for such a renderer architecture:
https://github.com/nvpro-samples/optix_advanced_samples/tree/master/src/optixIntroduction

Putting performance numbers on these options isn’t possible without the actual implementation comparing exactly the same things. Your mileage may vary. You’d need to do that for your specific use case.
Normally the fewer memory accesses, dynamic calls, optixTrace calls, live variables, and the smaller the code, the better the performance.
With RTX hardware raytracing acceleration, BVH traversal and shooting rays is not the slowest operation anymore. Optimizing the shading code becomes really important then.

Hi Detlef,

That’s really a very detailed response.
It delivers a lot useful information.

Thanks a ton!