Thanks, I’ve asked internally what’s going on with that.
I’m using bindless callable programs exclusively. Bound callable programs are potentially slower.
Yes, that needs some copying of data to local structures which can then be passed on to the bindless callable program.
I normally use a reference to a local structure to pass as argument, e.g. State in the examples below and that is a small one. Often that structure can be used throughout the caller.
My main concern is kernel size and architecture elegance, then memory accesses. Getting that right almost automatically results in good performance. I only need one closest hit program for all materials in my path tracers, and one of them implements almost the full Material Definitition Language (MDL) spec and, boy, that needs some amount of local storage, but without recursions that is not a problem.
Also it’s actually supported to pass references/pointers of the current rtPayload, although that is generally illegal and an error for any variable declared with rtDeclareVariable otherwise.
Example where I’m using that: