Hi, I have developed a radiometric accurate Optix kernel that traces rays through unstructured triangularized volumes consisting of multiple molecular species. This solution works well but can require considerable stack memory (approx 1 KB) depending on the number of wavelengths and molecular species used in the per-ray radiance transport integration equations.
My problem is that if the per-ray stack memory is dimensionalized to the worse-case scenario, the solution will run much slower than if the stack is dimensioned to the actual number of wavelengths and molecules defined in the user’s inputs.
My currently solution is to dynamically build the kernel during program initialization by passing various “defines” to the nvcc compiler to dimensionalize the stack. The drawback of this approach is that it requires the end-user to become a licesed Optix user and to download its libraries (a code-dependency that some users may not want to do).
Alternatively, I can pre-build multiple kernels for different stack sizes and deliver that with my release packages. This is doable but leads to messy packaging and maintance problems when users request configurations that haven’t been pre-built.
Any advice that can be provided would be greatly appreciated as I move my code from a Research to Enterprise application.
Thanks,
Dennis