Limit on Registers for Intersection

I am trying to pass data computed in an intersection program. Is the limit for attribute registers set to 8? What if you want to pass more (say 10) during an intersection, is using an address pointer okay from the intersection program?


Currently, yes, you get 8 attribute registers.

It is okay to pass a pointer via attributes, which would use two of your attributes unless you’re using a tricky pointer encoding. But! Be aware you should never try to pass back a pointer to the intersection program’s local memory because that memory is out of scope after intersection. This means it’s unlikely you would ever want to pass a pointer via attributes. More likely, you would pass a pointer to an output buffer into the intersection program using the payload registers or memory somewhere. (But storing to memory is not recommended, and harder than it sounds, due to the multiple out-of-order intersection calls during a given thread.)

Note that best performance may be to compute any extended intersection attributes in a closest-hit program or any-hit program if possible, rather than in the intersection program, even if that means doing a bit of seemingly redundant computation. This is because of how often the intersection program is called compared to a hit program. Memory access in the intersection program can have a much larger impact on performance than in hit shaders, so our recommendation is to prune the data needed & returned from intersection down to a little as possible.


1 Like

Thank you @dhart for the reply.

This is good information to know. However, now that I have thought of it some more, I think I may go ahead and just loose a couple of values I was going to try and pass from the Intersection program given that it will be a lot more robust to just use my 8 attribute registers as per usual.

Thank you again, great information and much appreciated.

1 Like

Thanks again @dhart for the response.

Another question that is a little off topic but I just want to be sure I understand intersection program(s).

Say I am using built-in triangle primitives but I still want to define an intersection program using __intersection__myIntersection (for example). Am I correct in assuming that OptiX 7 will use my intersection program even though I am using built-in triangle primitives?

You’re allowed to use multiple types of intersectors in a scene, and it’s fine to mix the OptiX built-in intersectors (triangles & curves) with your own custom intersectors in the same scene.

The main caveat is that you can’t mix different intersection types in a single GAS (Geometry Accel Structure). What this means is you can make a GAS for your triangles, and then make another GAS for your custom primitives based on your own intersector. Then use OptiX instancing features to create a top-level IAS (Instance Accel Structure) that contains your two or more GASes, and trace rays into the IAS.

“Multiple build inputs can be passed as an array to optixAccelBuild to combine different meshes into a single acceleration structure. All build inputs for a single build must agree on the build input type.”

You can’t use the OptiX API to pass built-in triangle built inputs and then use your own triangle intersector on that data, if that’s what you meant. But that’s just a matter of API usage; to get your own triangle intersector working, you’d use the custom primitive build input rather than a triangles build input.


Thanks for the fast reply @dhart

Unfortunately the quote above is exactly what I am trying to do. So I guess I should define a custom primitive ‘triangle’ and then use.

Yes, you only need to use the custom primitive build input. The only difference in the API usage is that you provide the bounding boxes explicitly. In other words the only benefit of mixing a custom intersector with the triangle build input would be that OptiX computes the bounding boxes.

Can I ask what the goal is? What are you able to do with a software triangle intersector that you can’t get out of the OptiX hardware intersector? I’m asking for two reasons - in case there’s an alternative higher performance way to achieve what you want, and also in case there are things we should add to the hardware intersector.



I am using the intersection to compute a shading normal and geometric normal off a flat triangle structure.

Quick question with regards to custom primitives, if that is okay.
If I define

OptixBuildInput aabb_input = {};

aabb_input.aabbArray.numPrimitives = 1;

Does aabb_input.aabbArray.numPrimitives = 1 mean a single primitive such as a single quad is defined or a series such as a single mesh of quads.

Thanks again

The numPrimitives variable is referring to the number of triangles or quads, it’s the length of your AABB array.

Okay, for what it’s worth, you don’t need a custom primitive in order to get shading normals and geometric normals. In fact, we always recommend computing normals in your hit programs and not your intersection programs. Normals are only needed for shading, not for intersection, and so computing them in the intersector is wasting the normals most of the time (whenever a closer intersection is found after a further one).

The way to get normals in your hit programs when using the built-in hardware triangle intersector is to call optixGetTriangleBarycentrics() which allows you to interpolate your own normals. If you want access to the built-in triangle vertex data, you can enable OPTIX_BUILD_FLAG_ALLOW_RANDOM_VERTEX_ACCESS, and then in your hit program you can call optixGetTriangleVertexData().


1 Like

Thanks @dhart

Still a lot to learn on my behalf.

Please have a look at the OptiX 7 SDK examples or other OptiX 7 examples. Most of them do what you’re looking for already.

Follow the last link to the OptiX 7.2 release post in here to find them:

Only the vertex positions are stored inside an acceleration structure. Means optixGetTriangleVertexData() will only allow the calculation of the face normal.
For the shading normal you need to store the per-vertex normals yourself, and then you can also store the vertex positions along with them because it’s faster to read them from buffers than from the usually compressed acceleration structure.

Looks like this for a single-instance hierarchy scene:
And like this for a deeper hierarchy, including motion transforms:

Thank you @droettger for the links and information.