General consensus - recursive trace calls and performance

Good morning,

I have a theoretical question about recursive trace calls in OptiX 7.2.

I assume that calling OptixTrace from within a shader such as ClosestHit multiple times will yield degrading performance. Say I have a defined MAXIMUM recursion variable passed via launch parameters and recursively call OptixTrace from ClosestHit shader for that number of times - as I increase the value of MAXIMUM recursion the performance will get increasingly worse.

What is the general consensus on the resulting performance of these kinds of call(s)?

Thank you for any information.

The performance is directly related to the number of rays you shoot in your whole scene per optixLaunch.
Means if all hits will enter the recursion you will get 2^N rays with N recursions as worst case example for transparent materials following the reflection and transmission direction! (It’s not that bad with all opaque materials, but then an iterative algorithm would be much easier.) That transparency case is obviously exponential growth which means the number of rays will get massive pretty fast, and similarly the time it takes per optixLaunch.
I’d recommend counting your rays per launch (that is, all optixTrace invocations) as a debug exercise and see how your application performance relates to that.

But you would get a similar performance impact when shooting the same number of rays iteratively, like storing the next rays from a hit onto some per ray stack and executing the optixTrace per launch index inside the ray generation program.

The major difference between recursive and iterative algorithms would be the amount of stack space required.
Since OptiX needs to store live state around an optixTrace call in addition to the local space each hit shader invocation needs, this can become too large to fit into the possible stack size. Memory accesses are generally bad for GPU performance.

The standard approach to keep things more interactive is to do less work more often.
Progressive path tracers solve that by being iterative and following only one path stochastically (Monte Carlo algorithm) instead of all possible recursive paths at once. Then they accumulate (integrate) the partial results over multiple launches to converge to the final result.

Note that there is a maximum limit of recursions you can have in OptiX (currently 31) and there is also an (undocumented) internal maximum stack space where optixPipelineSetStackSize won’t succeed anymore.
https://raytracing-docs.nvidia.com/optix7/guide/index.html#limits#limits

2 Likes

Thank you @droettger some great information.

@picard1969, not long ago I was pondering about that very same question and set up a lab with an OptiX implementation of Peter Shirley‘s Ray Tracing in One Weekend: recursive took about 30% longer than iterative. Ray total was 110+ Million rays, max depth was <16 (see image: the brighter, the deeper). Regards, Jürgen.

3 Likes

@otabuzzman - that is interesting.