Hello,
we’re working with an optical engineering software for non sequential raytracing simulations. It offers the possibility to run the simulation on a GPU and uses the Optix library. One of the configurable settings is the number of splits to consider in a trace which translates to the trace depth in Optix.
What we experience is that for a single precision simulation the max trace depth is 10 even for the most simple geometry on a RTX A4500. The application hits a limit when it creates the ray tracing pipeline with a larger value.
After skimming through the Optix documentation my understanding is that this is related to the total stack size that the pipeline would require for the trace. Since this seems to only depend on the programs of which the pipeline is composed and the trace depth but NOT the size of the scene graph (BVH), it would explain the observed behavior. When performing a double precision trace, the limit is 5, i.e. half of the single precision run which I would also expect according to my understanding.
Although the stack size required by a pipeline for a particular call graph is hard to predict because it depends on so many parameters there must be some limited resource on the GPU which determines when the requirement is too high.
I would like to understand which resource this is (memory, L2 cache, L1 cache, … ?) to enable an educated decision which GPU to chose for extending the trace depth limit for a particular situation. The capability of the RTX A4500 is unfortunately too low in our particular case.