In the Device.cpp code link above you can see that I have raygen, miss, exception, closest hit and any hit programs as well as different direct callable programs in different PTX files. That’s all.
None of these hit your initial issue because there is always at last one of the program types inside each PTX with this code structure.
That partitioning into separate files isn’t necessary and you could also put everything into one source file.
It’s simply a matter of balancing compile times during development and initial startup times during runtime.
Since OptiX caches the internally assembled code and CUDA does the same with the final microcode, it’s only the initial startup of the application where the pipeline creation path is slow.
You can see that when enabling the OptixDeviceContextOptions callback at level 4, which will print out the program cache misses and hits.
However in this case, despite the original big .cu file (environmentRender.cu) still contains ton’s of Optix programs, an extra dummy __raygen__ program is needed or the module can not be created and return this:
[ 2][COMPILE FEEDBACK]: COMPILE ERROR: Invalid PTX input: ptx2llvm-module-001: error: Failed to parse input PTX string
ptx2llvm-module-001, line 9; warning : Unsupported .version 6.5; current version is ‘6.4’
ptx2llvm-module-001, line 460; fatal : Parsing error near ‘.version 6.5’: syntax error
Cannot parse input PTX string