What is the best practice of implementing shader graph in Optix?


Pixar mentioned that they use callable programs to build their shading network, which can scales you know to be huge:

starts at 05:45

My quesion is, is that the best practice ?

  • Will callable programs auto-merge / bake to be more efficient code, or it just spreads function pointers every where ?
  • How it scales when increasing number of nodes ?

Some discussion about function pointers:

Redshift said they found function pointers are too slow, so they use big switch statement instead (starts at 05:00).

Confused about what is the best way to do. Suggestions are welcome.



I think the answer depends on how far you want to take that. You would have to balance between maximum flexibility vs. maximum performance for your use case.

OptiX is using a jump table for all reachable kernel functions internally as well, so that agrees with the second presentation’s statements, and you wouldn’t need to care about that part.

The callable programs mechanism in OptiX is rather mighty as you’ve seen in Pixar’s presentation
I’m using them a lot as well, but didn’t take it as far as Pixar’s shader graph.
I’ve taken a middle ground and use buffers of callable program IDs to implement all “fixed function” elements. That includes different lens shaders, light sampling, EDF evaluation, and BSDF sampling and evaluation functions. That greatly reduces the overall kernel size already.

The part which builds the individual material graphs generates CUDA code and compiles each material shader at runtime using nvrtc (takes about a second per material) and caches it.
Materials which result in the same shader, will reuse that PTX source program with different material parameters, (e.g. if there are only three wood shaders for matte, oiled, and coated, there could be hundreds of different woods, defined by their respective parameters and textures maps.)

I hold the materials parameters globally in a buffer of structs (containing unions) and the index into that is given to the material traverser program and builds a material instance by connecting the material traverser with material parameters. A material parameters description also generated at runtime defines which material parameter is placed where in that struct.

These “material traverser” bindless programs are put in a buffer of bindless callable program IDs as well and that is basically a big function table you can index at will (also from the ray generation program which means you can implement integrators for any light transport algorithm there). That’s not possible with bound callable programs. I don’t use them at all anymore.

The material traverser can be called for different requests like calculating the geometry displacement (this is tricky, I can do object space only and it’s slow when done dynamically at runtime), normal, and cutout opacity, to get material global parameters (IOR, absorption and scattering coefficients, thin-walled status), to estimate the sampling probabilities and modifiers, and the BSDF evaluation for direct lighting.

For a limited number of material shaders in a scene (like < 100) this gives pretty optimal small kernels although I haven’t optimized it completely, yet.

You can find my GTC presentations which explain some of these here: http://on-demand-gtc.gputechconf.com/gtcnew/on-demand-gtc.php
Slide 12 in my GTC 2017 presentation shows a block diagram of that renderer architecture.
Some of that is obsolete with the MDL SDK availability and its PTX code generation backend now.

OptiX is pretty flexible in that regard and there would be other approaches possible as well. Pick what matches your use case best.

Thank you Detlef, very helpful answer.
I have watched your presentations of 2017 and 2016, which brings me useful information as well.

I am going to try…