How to collect all intersections using anyhit?

Dear All,

Similar to Processing intersections in order, I’m trying to process all intersections along a ray while modifying a payload on every hit then collect the hit information in raygen program. Curently, I’m calling closehit for multiple times for each ray, but the rendering speed is quite limited when the number of triangles in each mesh goes larger.

I kind of what to try the anyhit approach. i.e. shoting only one ray and modify the anyhit shader to modify the payload of each intersection and collect all payloads in the raygen program.

However, I’m a newbie to optix, and gets confused about

  • What kind of Payload data structure to use as the number of intersections is different for each ray?
  • What should I do to modify the Payload in anyhit?
  • And especially, how to collect those intersections in raygen so I can write a custom cuda kernel to sort or interpolate between them?

Can anyone kindly provide some example code blocks?


Hi Ree,

One example along these lines that we’ve published in the past is optixParticleVolumes; it collects multiple hits using the anyhit program, and then sorts the results in t-order back in the raygen program (because anyhit programs are not guaranteed to be called in t-order along a ray). optix_advanced_samples/src/optixParticleVolumes at master · nvpro-samples/optix_advanced_samples · GitHub

That sample uses OptiX 6, which we don’t recommend using for new projects. But you can study the payload and data handling and sorting process. You can easily do the same thing using OptiX 7.

Some OptiX 7 SDK amples that demonstrate use of anyhit programs are optixCutouts and optixWhitted.

As to what should go in your payload and how to modify it, that probably needs a bit more elaborating on what you want to achieve before we can advise how to organize your payload. It is important to understand that asking for all intersections along a ray is fundamentally going to take significantly longer than asking for only the closest hit, so the rendering speed limits you’re seeing with larger meshes and higher numbers of intersections to process along a ray, this is inherent to the problem and you’re likely to have the same issue when using anyhit programs too. Another thing to be aware of is that storing a collection of hits to memory and sorting them afterward will cost considerable performance because of the memory bandwidth needed. The best advice we can give to improve performance would be to discuss and help brainstorm how you might achieve your goals without either storing hit info to memory and/or without having to process all intersections along a ray. Of course, some problems do need all intersections, that’s okay, it’s just helpful to have your expectations match the performance you can achieve.


Hi David,

What I’m actually trying to do is to perform volume rendering on a volumetric mesh (which is basically a tetrahedral mesh aligned with the volume). The density and color values is stored on the vertex on the tet mesh.

During ray-traced volume rendering, suppose I sample t points on the ray, I need to identify which tet the sampling point belongs to and calculate the barycentric coordinate of the point. Following GitHub - owl-project/tetMeshQueries: Library that demonstrates how to do GPU-accelerated tet-mesh cell location queries using OWL, I build a shared-face geometry, which associates each triangle to its two belonging tets (a front one and a back one), I can optixTrace to find the barycentric coordinate and ray tracing depth D for each intersection and interpolate two neighboring intersections to find the barycentric coordinate of a sample point between those two intersections.

Currently, I’m using closehit multiple times to collect all intersections with the code below,

      // call track
        0.,    // tmin
        1e20f,  // tmax
        0.0f,   // rayTime
        SURFACE_RAY_TYPE,             // SBT offset
        RAY_TYPE_COUNT,               // SBT stride
        SURFACE_RAY_TYPE,             // missSBTIndex 
        u0, u1);

      // Terminate on __misshit__
      if (prd.tetID == -1 && prd.depth > 1e15f) { break; }

      // move ray origin to the hit position plus a small value.
      origin = origin + rayDir * (prd.depth+1e-6f);

      // collect Prd
      // ...

I kind of wondering whether using anyhit to collect the data would be more efficient than using closehit multiple times? Currently my PRD structure looks like follows:

struct Payload
    float4 barycentric;    // barycentric coordinate of the intersection (mapped to belonging tet)
    int tetID;             // belonging tet id
    float depth;           // ray depth

Any idea? Should I keep the current solution or try anyhit?


It’s possible that using an anyhit program could shave off some of the overheads of casting a new ray for every sample, but the big problem with using anyhit for volume rendering is needing the hits to be in t-order, which leaves you adding code with new overheads to either collect data in memory about all the hits, or trying to design an algorithm that can process hits out of order. Neither is easy. So it’s a tradeoff that depends on a lot of factors, and hard to say which one will be better for volumetric mesh rendering without trying both.

If you have the time, I personally think trying the anyhit approach will be very instructive and help you learn some valuable things about your problem, but I also think it will be difficult to make the anyhit approach be more performant than your existing closest-hit approach. Not impossible, just difficult. So if you want and expect to learn how to do anyhit processing, you’ll have fun, but if you just want it to go faster, it might work or it might be frustrating.

If you want to save hits into memory and sort them in t-order, you may need to consider approaches where you compress and/or recompute some of your payload information - technically the tetID is the only item you truly need, because you can recompute the barycentric and t-depth later. That would reduce your hit info storage and bandwidth down from 24 bytes per hit to 4 bytes. If you can get away with 2 or 3 bytes per tetID, even better.