Direct callables vs inlined functions

Hi,

Wondering if there is any general guidance on the usage of direct callables vs inlined device functions? Is there an overall performance cost associated with direct callables that isn’t present for inlined code? Seems like most direct callables could be replaced with inlined functions, but wondering if that is not the best practice.

Thanks

Hi @jagj,

There is some very high level guidance in the Programming Guide here: https://raytracing-docs.nvidia.com/optix7/guide/index.html#callables#callables

The main thing things to know about callables when evaluating their impact on performance is that 1) they become an explicit state in the OptiX state machine, similar to a “program” like closest-hit; 2) they are not inlined but are instead explicit function calls that use the normal calling mechanism of passing parameters in registers; and 3) inlining does not always improve performance, sometimes it can hurt run times, compile times, or both.

The implication of 1) is that OptiX can internally schedule and synchronize based on states, which can potentially mean improved performance, if the callables are used to replace multiple cases of a part of divergent code. Like if you have a switch statement to decide how to shade something in closest-hit, and each separate case is a large and complex bit of shading code, then replacing the switch with a callable is sometimes more effective.

The implication of 2) is that the compiler will usually spill the caller registers to memory to make room for argument passing, which can potentially degrade performance just like any non-inlined function call might.

If your closest-hit program is relatively simple then callables are likely to hurt performance. But if your shader is very complicated, perhaps an arbitrarily large shader graph generated by an artist, then callables might improve your performance.

Another case for callables is when your compile time starts to get too long. Excessive inlining can cause compile times to get very high, sometimes exponentially. One case where I saw a huge win with callables was with a customer who had 45 minute compile times. They replaced their high level shading blocks with callables, and their compile times dropped to just a few minutes (or maybe less, fuzzy memory) and the overall rendering performance approximately doubled.

I hope that helps a little. I know it’s not necessarily crystal clear which way to go, so my best advice is if your programs might be complex enough that you think you might benefit from callables, or if your compile time is getting long, then try it both ways and do some profiling.


David.

Hi @dhart ,

Thanks for the breakdown! This is helpful, especially regarding the potentially extra registers associated with a callable for simpler routines.