Kernel Recompilation

GL_Kyle · April 23, 2014, 9:41pm

I’m encountering fairly large delays due to kernel recompilation in an application that requires fluid user interaction. I’ve been trying to stamp out everything that triggers a kernel recompile – I’ve had some success, but I can’t seem to eliminate it entirely. So my question is this: Which operations trigger a kernel recompile?

nljones · April 23, 2014, 10:07pm

In general, rtDeclareVariable() triggers a recompile and rtVariableSet() does not. rtBufferMap() and rtBufferUnmap() also do not cause a recompile, so you can change the contents of a buffer if it already exists. My suggestion is that you create all the variables, buffers, materials, programs, etc., that you could possibly use up front, and between kernel invocations, you only set values.

GL_Kyle · April 23, 2014, 10:47pm

Thanks! That’s extremely helpful.

Is it safe to assume the same line of thinking applies to geometry, groups, and acceleration structures?

nljones · April 23, 2014, 11:04pm

That would be my expectation.

GL_Kyle · April 24, 2014, 1:26am

Removing all of the rtDeclareVariable didn’t fix the problem. It did, however, allow me to narrow the cause to these functions:

rtGeometryGroupSetChildCount
rtGeometryGroupSetChild

Does anyone know if it’s intended that these functions trigger a kernel recompile? That would be unfortunate.

GL_Kyle · April 25, 2014, 4:25am

Just in case anyone else is interested, the answer is yes. I wish kernel recompiles and their ill-effects were featured more prominently in the documentation – they are really quite nasty.

adamce · April 25, 2014, 9:16am

I don’t know if rtGeometryGroupSetChildCount causes a kernel recompile, but it certainly causes rebuilding the acceleration structure.

did you try to change the type? some of them are building faster than others.

@recompile of kernel
did you read the section in the programming guide about performance (ch. 11). it is said that changing programs of materials also causes a recompile.

GL_Kyle · April 25, 2014, 6:42pm

Yes, it does seem to be a compile and not an acceleration structure rebuild. I’ll try to share my understanding of what’s going on. If anyone sees an error, please correct me – maybe you’ll save myself and my team from rewriting a lot of code :)

Normally, the compile, acceleration rebuild, and ray trace seem to be bundled up into the launch function. They form a pipeline. The compile and acceleration structure rebuilds only take place if necessary though.

For benchmarking purposes the pipeline can be separated into its steps by first calling rtContextCompile to trigger a compile, then launching with (0, 0) as the viewport coordinates to trigger an acceleration structure rebuild, and finally launching over the proper viewport coordinates to perform a trace.

When I add an object to my scene (a simple implicit surface sphere in this case) the compile takes ~2s, the acceleration structure rebuild takes ~4ms, and tracing my scene takes about ~2ms. The acceleration structure rebuild is really the least of my worries here – the compile time is several orders of magnitude larger than the sum of the other parts.

nljones kindly suggested above that the cause of the recompile may be the declaration of new OptiX objects. This seems to be accurate, but the compile also seems to be triggered by moving objects in the scene hierarchy. This is suggested at in the documentation for rtContextCompile: “rtContextCompile creates a final computation kernel from the given context’s programs and scene hierarchy. This kernel will be executed upon subsequent invocations of rtContextLaunch functions.”

It’s not abundantly clear what the documentation means here. For example, I’m not sure if I would assume that declaring new objects falls under either changing the context’s programs or scene hierarchy, yet it still triggers recompilation. Unfortunately, no optimizations of acceleration structures are going to save me from multi-second compiles during the middle of operation. I’m looking into workarounds that seem to work, but they are not pretty.

adamce · April 25, 2014, 9:06pm

wow, multi seconds compile? the program i’m working on right now has about 700 cuda lines, several callable programs and compiles in a fraction of a second.

And I just recall, that callable programs are shortening the compile time. maybe this could solve the issue on a different front.