Question about function vs callable programs

Hi all,
I am using Optix 6.0, with CUDA 10.0.130, Ubuntu 18.04 and gcc 7.3.0, driver 418.43.
I am not sure about the use of callable programs vs plain functions. That is, up to now, I have been using callable programs when I needed some function. I have tried changing some of them to functions.
For instance, I have declarations like the following:

__forceinline__ __device__ void traceSomeRay(RayPayload& rayPayload) {
  ...
  rtTrace(root, ray, rayPayload);
}
__forceinline__ __device__ void traceAnotherRayType(RayPayload& rayPayload) {
  ...
  rtTrace(root, ray, rayPayload);
}

Something similar is done in the advanced samples so I do not see any problem. I can put those definitions in a traceFunctions.h file and include and use them with different ray generation programs.
I can also include the file in a closest hit program and do recursive tracing.

Reading the documentation, it suggests using callable programs for “changing of the target of a function call at run time”, which I do not really do. But it also says:
“Also, if you have a function that is invoked from many different places in your OptiX node graph, making it an RT_CALLABLE_PROGRAM can reduce code replication and compile time, and potentially improve run time through increased warp utilization.”

So the question is: is it better to use callable programs rather than using the functions like that? Even if I use them to do recursive tracing?
I am not sure if I am messing with something. I have tested both approaches in a small scene and I cannot see any apparent problem. I do not see any performance difference either.

Thanks for your advice.

For small functions such as this, inlined cuda functions are almost always the right way to go. If you are running into long JIT compile times, you can try moving some of your inline functions which result in lots of duplicated code into callable programs, but there can be a perf cost to this. If you are not worried about JIT time, just use the inline funcs.

Thanks a lot

Just to break it down into general rules of thumb:

  1. Default to using inline cuda functions
  2. Consider using Callable Programs if:
    • You are running into long JIT compile times and tests show that reducing code size via Callable Programs gives compile-time improvements
    • You want to have virtual-function-like behavior with your functions. For instance, you could have a Callable program call representing a 3d noise function. Then you can attach either a Perlin noise, Worley noise, etc at runtime