The optix programming guide suggests us to minimize the stack as much as possible. However, it becomes annoying when the application is under development, as each new change may need re-adjust of the stack. I am wandering if there are ways to estimate the stack budget rather than setting a very dense value each time.
For doing this, I need better understand the stack size, how and where it is used. From the comments of OptiX SDK 5.1.1, I only know, the stack size is in bytes and is a global value for a Optix Context.
- But, what is the reference object for this value. Is it the bytes for a CUDA block or thread?
- If my OptiX kernel is register bound, could I make a large stack budget once?
- From here https://devtalk.nvidia.com/default/topic/527768/optix/optix-stack-size/ I know, the stack in OptiX is LMEM that is cached with L1, does it mean the more stack I set for OptiX, the less L1 is left for other caching?
I tried to increase my stack size from 1000 to 3000, for example, and there is no visible performance difference with a few test scenes. So I guess, there should be some ways to estimate the budget.
BTW. In OptiX 5.1.1 example PathTracer, changing stack size from 1800 to 40000 on GTX 1080Ti doesn’t make visible different performance at 1080P, but changing to 90000 make the demo super slow. Does it mean something?