Questions about the implementation of OptiX

Hello,

I have been working on OptiX for about one month. OptiX is indeed fast and easy to program. However, I have always had questions about OptiX’s relationship with CUDA.

First of all, is OptiX developed right on CUDA? The RT_Program, as I found, was a macro of global. Can I see all programs in OptiX as kernels?

If we can see all programs in OptiX as kernel functions, then here comes the second question. How is the recursive structure implemented? For example, when OptiX is working on a recursive ray casting task, first rays are generated in parallel in a ray generation program. Then the ray_intersection program decides whether the rays hit the geometry. The rays which hit something call the closest hit program for some logic. Then, if needed, secondary rays are generated in the closest hit programs. Rays are generated recursively like the above descriptions and finally terminated by some logic and return layer by layer to the ray generation program. My second question is, how many kernels are initiated in this process? Are all of the previous operations completed with only one kernel call, or multiple kernel calls happen in the whole process?

And my last but original question, how is OptiX designed, and how are the stacks implemented? In my understanding, each thread is doing a recursive work. How is this thread level recursion realized? Maybe someone from Nvidia is more likely to answer these questions in detail. Thank you all for reading my concerns, and I really appreciate anyone who is able to provide his/her thoughts.