How to pass a buffer of graph nodes to Optix?


I’m having trouble passing a buffer of graph nodes to Optix.

Is it possible?
Do you have code examples showing how to do so or some indications?

More details

We need to trace rays to a set of specific graph nodes (more specifically transform nodes).

This minimal example works for a single node passed as a context variable.

// Host
context["MyTestNode"]->set(myObject.transformNode) // transformNode is an optix::Transform

// Device
rtDeclareVariable(rtObject, MyTestNode, , ); // rtObject on the device side
rtTrace(MyTestNode, ...);
/* this works */

Now we need to support a dynamic set of nodes so we’re trying to send them to the device in a buffer.

// Host
memcpy(nodeBuffer->map(),, sizeof(optix::Transform) * nodes.size()); // nodes is a vector<optix::Transform>
nodeBuffer->unmap(); // Set up as RT_FORMAT_UNSIGNED_INT or Optix warns that the data should be 4-byte long when using USER_FORMAT with sizeof(optix::transform)

// Device
rtBuffer<rtObject> MyTestBuffer;
for (int i = 0; ...)
    rtObject node = MyTestBuffer[i];
    rtTrace(node, ...);
    /* this does not work */

I suspect that its’ a type/data size issue.
There are several types available for nodes (rtObject, RTobject, RTtransform, optix::Transform, optix::TransformObj) but I’m pretty confused about the correct ones to use in that case, on the host & device sides.


Hi wanmer,

It is not currently possible to pass a buffer of transform nodes to OptiX. You’re right, it’s a type issue, as you noted there is no RT_FORMAT_TRANSFORM. You can pass a transform node via an OptiX variable, but I realize that’s not helpful if you have a lot or a dynamic number of transforms.

We would love to hear a little bit more about your use case, since it sounds interesting.

There are a couple of ways around this issue, depending on what you are trying to do. It mostly depends on how you are combining the results of rtTrace inside your loop.

You might be able to put your loop on the host side, and launch once for every transform node. Of course, if you have a lot of transforms and a small trace workload, that might be super slow. Alternatively, you could put all your transforms into a group with a NoAccel acceleration, and then attach an anyhit shader to the transform nodes. That way, the anyhit shader will execute once for every transform that is intersected. With a little CUDA glue, you should be able to structure it logically similar to your example.

I hope that helps. Let us know if neither of those suggestions is feasible. And if you are willing & able to share more details about your project, please do. Maybe there are other ways to achieve your goal.


Hi David,
Thanks for your suggestions.

More details about our use case:

We have Multiple Importance Sampling in our pathtracer, and for each surface/mesh light interaction, we:
1. Do light sampling: we get a direction from the light by sampling its surface and we evaluate this light ray
2. Do BSDF sampling: we get a direction from the object surface by sampling its material’s BSDF and we evaluate this light ray

So our Optix graph is structured like this:

..............+------------- RootNode  ------------------+-----------------------+------------------+ 
              |                                          |                       |                  | 
              |                                          |                       |                  | 
              |                                          |                       |                  | 
              |                                          |                       |                  | 
       RootGeometryNode                             Light1Node              Light2Node             ...
   |          |           |                                                                           
   |          |           |                                                                           
 Obj1        Obj2        ...

For each sample, we use rtTrace() to check that our rays are not occluded by other objects and that they do hit the light source.
For sample 1, we just check visibility (we know we’re aiming at the appropriate light source).
For sample 2, we need to make sure we only check intersection against one specific light, not all light sources (we don’t want the get the energy from another light).

And the code looks like this:

// Light sampling

Ray lightRay = /* ray to the light source */;

rtTrace(RootGeometryNode, lightRay, payload); // To check visibility, ignoring all light sources

if (!hit(payload))
    // The ray is not occluded
    /* Evaluate the BSDF for this ray... */

// BSDF sampling

Ray bsdfRay = /* ray from the surface */;
rtObject lightNode = /* Transform node of the specific light we're considering */;

rtTrace(RootGeometryNode, bsdfRay, payloadGeom); // To check visibility, ignoring all light sources
rtTrace(LightNode, bsdfRay, payloadLight);       // To check that we hit our light source and retrieve its energy

if (!hit(payloadGeom) && hit(payloadLight))
    // The ray is not occluded and it hits the light source
    /* Evaluate the BSDF for this ray... */

Basically, when evaluating a surface/light pair, we need to filter out other light sources. We want to keep things as fast as possible so tracing against the light’s sub-graph seemed like a good compromise (rather than for instance tracing against the whole graph and filtering intersections with some payload flags).

You say that it’s not currently possible to pass a buffer of transform nodes. Is it also impossible to pass a buffer of Group nodes?

For now, we’re going to directly compute the ray/mesh intersection in the MIS routine. It’s ok since we only use area lights for the moment. However, it will be a problem when we support more complex mesh lights.

We’re happy to take any suggestion!

That’s correct, we don’t currently have support for passing a buffer of nodes of any kind. Nodes need to be assigned to RTvariables. We will discuss your (entirely reasonable) MIS use-case internally and evaluate if passing buffers of nodes is something we should add support for.

I understand the intent, and it seems like a good idea to me. Still, I’d recommend validating the assumption that direct tracing individual lights is faster than tracing the scene & filtering. It depends on many things, of course, but I wouldn’t be surprised if there are times when always tracing against the scene root node & filtering intersections is faster than tracing against lots of individually managed objects, for reasons that might be unintuitive… especially on RTX hardware.


An alternative approach, common to many physically based renderers, is to sample a single light, but generate a PDF based on the combination of all lights’ PDFs for that sample. Then you can trace against the entire set of lights and count the contribution regardless of which light is hit.