Spawning 4 more rays in miss program?

Basically, after an optixTrace call, if the ray misses I want to spawn 4 more rays at different locations, and if it intersects it will terminate. Is there a way to accomplish this using Optix?

Yes, you are controlling when which ray is shot inside the OptiX device programs.

Your algorithm description is a little sparse on the details, so I’m assuming the four additional rays are not spawning additional four rays recursively and the whole algorithm terminates on the very first intersection.

Since the four additional rays should be started when the first one misses, this would best be done inside the ray generation program. (It’s also possible to call optixTrace from the miss program, but I wouldn’t do this.)

So here’s some pseudo algorithm:

#include <optix.h>

struct LaunchParameters
{
  // 8 byte alignment
  OptixTraversableHandle topObject;

  // You can also use whatever pointer type your output data should be here, like float* etc.
  CUdeviceptr outputBuffer;
};

extern "C" __constant__ LaunchParameters launchParameters;

extern "C" __global__ void __raygen__probe()
{
  // This is the resulting intersection distance of the first intersection of any of the five rays.
  float distance = -1.0f;  // Negative means miss.

  // The algorithm shoots 1 + 4 rays at maximum per launch index.
  for (unsigned int i = 0; i < 5; ++i)
  {
    // Calculate the ray depending on the index. 
    // i == 0 is the primary ray. 1 to 4 are the additional probe ray when the first missed.
    float3 origin    = ... // Calculate primary and additional four rays depending on index i.
    float3 direction = ...
    float  tmin      = 0.0f; // Must be positive.
    float  tmax      = smallestValueWhichStillCoversTheWholeScene; // Must be > tmin.

    // Let's assume the closest hit program stores the closest intersection distance of the first hitting ray.
    // Payload registers are unsigned int. Encode the float bits into that.
    unsigned int payloadDistance = __float_as_uint(-1.0f); // Default to miss result.
    
    optixTrace(launchParameters.topObject,
               origin, direction,
               tmin, tmax, 0.0f, // tmin, tmax, time
               OptixVisibilityMask(0xFF), OPTIX_RAY_FLAG_NONE, // Other ray flags are possible here.
               TYPE_RAY_PROBE, NUM_RAY_TYPES, TYPE_RAY_PROBE, // If there is only one ray type in your program, these values are 0, 1, 0
               payloadDistance); // Add other payload registers when you need more information.

    distance = __uint_as_float(payloadDistance); 
    if (distance >= tmin) // Positive means hit.
    {
      // If any data needs to be stored, for example which ray index hit, remember it here. 
      // ...
      break; // Terminate on first intersection of any ray.
    }
    // Keep shooting the next probe ray until there is an intersection or all five rays missed.
  }  

  // All OptiX output goes to buffers!
  // Store your desired result into the outputBuffer which pointer is stored inside your launch parameters.

  const uint2 theLaunchDim   = make_uint2(optixGetLaunchDimensions());
  const uint2 theLaunchIndex = make_uint2(optixGetLaunchIndex());

  // When using 64bit CUdeviceptr values, they can be cast to any type here
  float* buffer = reinterpret_cast<float*>(launchParameters.outputBuffer);

  // This assumes the optixLaunch dimension (width * height (* depth == 1) matches the outputBuffer size (number of elements).
  const unsigned int index = theLaunchDim.x * theLaunchIndex.y + theLaunchIndex.x;

  buffer[index] = distance; // negative when all rays missed, positive intersection distance when any probe ray hit.
}


extern "C" __global__ void __closesthit__probe()
{
  // Return the intersection distance in world space in payload register 0.
  float distance = optixGetRayTmax(); // Just for example clarity.
  optixSetPayload_0(__float_as_uint(distance));
}

If you don’t need the closest intersection but only a boolean result if there is any intersection on one of the probe rays, then the optixTrace ray flags can be set to terminate on the very first hit (not necessarily the closest hit).
That looks like this: https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/rtigo10/shaders/brdf_diffuse.cu#L180

Thx for the response, I was wondering now, what if those 4 rays also should be able to spawn 4 more rays if either one of them misses, else that ray terminations.

Your algorithm description is a little sparse on the details, so I’m assuming the four additional rays are not spawning additional four rays recursively and the whole algorithm terminates on the very first intersection.

Basically this

It’s possible with OptiX, and there are multiple ways you might handle this.

One way is to generalize on what Detlef suggested, and loop over your spawned rays in your raygen program. You’ll need a way to collect spawned rays to work on in local or global memory, so this is feasible for a small number of spawned rays, but probably a bad idea if you have deep recursion and a branching factor of 4. The number of rays can increase exponentially, and you should avoid trying to keep a lot of intermediate state around for rays in-flight in a single thread.

Another way is to use actual recursion. You can call optixTrace from your miss shader, which can trigger another miss shader, and recursively call optixTrace. Included in the reasons Detlef advised against this is because you will have to calculate your maximum stack depth, and allocate memory for it (see the optix*SetStackSize() functions), and it’s very easy to run out accidentally or miscalculate and experience hard-to-debug crashes. So recursion can be done, but will probably cause you some pain, and the amount of recursion you can get might end up being fairly limited in practice.

A third way is to write a wavefront renderer and process one ‘wave’, or one segment of ray depth, at a time per OptiX launch / kernel. Save all the ray results to a buffer, and then generate a new wave (a new launch) to start where the previous one left off. Each wave can terminate rays and add new ones as you see fit. This is a very expensive option because the memory traffic required is significantly more than when you spawn a ray “inline” during a thread, but the advantage is that you will have the ability to control your memory usage and never risk running out like the above two suggestions. You might have to tile your renders and have a work queue management system. And you might have to wait a long time for the results, but compared to the other options it will be relatively easy to get the results without hitting any major walls.

Do you have specific needs for a branching factor of 4, or any number higher than 1? Note that what people do most often is use a branching factor of 1. This means that when you want to cast secondary rays, each ray that hits or misses spawns only a maximum of 1 replacement ray. One way to think about this is it’s conceptually the same optimization as tail recursion. A branching factor of 1 is generally what people mean when they talk about “path tracing” (though the term does have overloaded meanings). One major benefit of a branching factor of 1 is that you can loop in your raygen program, and you don’t need to use recursion or wavefront, or worry about an exponential memory explosion. A higher branching factor leads to exponential work for a pixel, and each ray has exponentially diminishing results. I think it is considered best practice to expend the majority of work effort early in your ray ‘tree’, like in the first and second path segments, rather than deeper in the tree, and instead of having a high branching factor, spend the time you save casting more primary rays. Apologies for lecturing if you know all this and have definite reasons for needing a higher branching factor.


David.