OPTIX_EXCEPTION_CODE_TRACE_DEPTH_EXCEEDED exception with multiple optixTrace calls in same hit program

I have a hit program using Optix 7.4 which works fine if I issue only a single optixTrace call in the program.

I changed the code to issue multiple optixTrace calls in a loop using the same origin point for each, but in a randomized direction. If the loop issues two or more optixTrace calls, I get the exception after a short while.

I’m puzzled since I do pass the current depth as the first (p0) trace program parameters, check the depth against the limit before calling optixTrace, and increment it each time, and since it works if the loop only iterates once.

I’ve probably got a conceptual problem here, that I’m missing. What comes to mind is that recursion is not working like I expect it to work in host based functions. I don’t know if all local variables in a hit program are being treated as a local stack frame, which potentially means lots of register save and restore for each recursive call, or if the model is different.

The other idea I had is I’m running into a self-intersection problem I’m not dealing with properly, but then it still seems that checking the depth limit should protect me.

One of the hit programs is below. At the moment the first loop does not run since params.numLights is zero, but that makes no difference.

I’ve set MAX_TRACE_DEPTH to 4 but also trying values up to 16 with no difference. I should be using the same MAX_TRACE_DEPTH value in calls to optixUtilComputeStackSizes and in the linkOptions.maxTraceDepth passed to optixPipelineCreate. Maybe I’m missing setting it elsewhere, but that’s not obvious.

extern "C" __global__ void __closesthit__lambert_material() {
///    printf("Depth: %d %s\n", optixGetPayload_0(), __FUNCTION__);
    // This closest hit program deals with a mesh that has one or more material specifications

    // Adapted from /home/dave/src/NVIDIA-OptiX-SDK-7.3.0-linux64-x86_64/SDK/cuda/LocalGeometry.h and /home/dave/src/NVIDIA-OptiX-SDK-7.3.0-linux64-x86_64/SDK/cuda/whitted.cu
    const uint3 idx = optixGetLaunchIndex();
    const uint3 dim = optixGetLaunchDimensions();
    int rayIndex = idx.y * dim.x + idx.x;
#ifdef TIMING
    params.profileCounters[HIT_COUNTER][rayIndex] = params.profileCounters[HIT_COUNTER][rayIndex] + 1;
#endif
    unsigned int randomNumber;
#ifndef EMPTY_SHADER
    // Get the index into the texture array for the triangle that was hit. optixGetPrimitiveIndex returns the index of the triangle as defined when building the
    // acceleration data structures.
    const int primIndex = optixGetPrimitiveIndex();

    // Get access to hit data structure used by this shader
    const HitGroupData *hitData = reinterpret_cast<HitGroupData*>(optixGetSbtDataPointer());

    const float2 barycentrics = optixGetTriangleBarycentrics();

    // Get the texture color for this pixel
    const int materialIndex = hitData->mesh.materials.indices[primIndex];
    const float3 color = hitData->mesh.lambertMaterial.specs[materialIndex].diffuseColor;

    const float3 currentRayDirection = optixGetWorldRayDirection();
    const float3 currentRayOrigin = optixGetWorldRayOrigin();
    const float3 ourRayOrigin = currentRayOrigin + optixGetRayTmax() * currentRayDirection;

    // Now get the triangle vertices as object coordinates
    float3 data[3];
    optixGetTriangleVertexData(optixGetGASTraversableHandle(), optixGetPrimitiveIndex(), optixGetSbtGASIndex(), 0.0f, data);
    const float3 triangleNormal = normalize(optixTransformNormalFromObjectToWorldSpace(cross((data[1] - data[0]), (data[1] - data[2]))));
    const float3 triangleP = optixTransformPointFromObjectToWorldSpace((1.0f - barycentrics.x - barycentrics.y) * data[0] + barycentrics.x * data[1] + barycentrics.y * data[2]);

    // Accumulate the light received from all lights in the scene that are not obscured by another mesh
    float3 totalLight = make_float3(0.0f, 0.0f, 0.0f);

    float lightDistance;
    unsigned int currentDepth = optixGetPayload_0();
    for (unsigned int i = 0; i < params.numLights; i++) {
#ifdef TIMING
        params.profileCounters[OCCLUSION_COUNTER][rayIndex] = params.profileCounters[OCCLUSION_COUNTER][rayIndex] + 1;
#endif
        lightDistance = length(params.lights[i].location - ourRayOrigin);
        float3 lightDirection = (params.lights[i].location - triangleP) / lightDistance;
        unsigned int traceResult;
        optixTrace(params.handle, ourRayOrigin, lightDirection, 0.01f, 100.0f, 0.0f, OptixVisibilityMask(255), OPTIX_RAY_FLAG_NONE, SHADOW_RAY, NUM_RAY_TYPES, SHADOW_RAY_MISS, traceResult);
        if (traceResult == LIGHT_HIT) {
            // If a light is not obscured, the intensity of the light reflected to the viewer is intensity * (cos(angleOfIncidence) / (4 * pi * distance**2), from section 1.2.3 of
            // https://pbr-book.org/3ed-2018/Introduction/Photorealistic_Rendering_and_the_Ray-Tracing_Algorithm
            // cos(angleOfIncidence) is calculated by dot(normalize(triangleNormal), normalize(lightDirection))
            totalLight = totalLight + params.lights[i].color * params.lights[i].intensity / dot(normalize(triangleNormal), normalize(lightDirection)) / (12.56637061f * lightDistance * lightDistance);
        }
    }

    if (currentDepth < MAX_TRACE_DEPTH) {
        printf("Depth %d limit %d in %s\n", currentDepth, MAX_TRACE_DEPTH, __FUNCTION__);
        float3 reflectionDirection = currentRayDirection - 2 * dot(currentRayDirection, triangleNormal) * triangleNormal;
        randomNumber = params.randomSeed * rayIndex;
        // Adjust ray direction by a random amount to simulate a surface that is not a perfect mirror.
        float fuzz = hitData->mesh.lambertMaterial.fuzzFactor;
        for (unsigned int i = 0; i < LAMBERT_RAYS; i++) {
///            printf("%03d:%03d:%03d lambert material depth %d\n", idx.x, idx.y, idx.z, currentDepth);
            float3 adjustedDirection;
            adjustedDirection.x = reflectionDirection.x + rand(&randomNumber) * fuzz;
            adjustedDirection.y = reflectionDirection.y + rand(&randomNumber) * fuzz;
            adjustedDirection.z = reflectionDirection.z + rand(&randomNumber) * fuzz;
            adjustedDirection = normalize(adjustedDirection);
            unsigned int p0;
            unsigned int p1;
            unsigned int p2;
            unsigned int newDepth = currentDepth + 1;
            optixTrace(params.handle, ourRayOrigin, reflectionDirection, 0.01f, 100.0f, 0.0f, OptixVisibilityMask(255), OPTIX_RAY_FLAG_NONE, NORMAL_RAY, NUM_RAY_TYPES, NORMAL_RAY_MISS, newDepth, p0,
                            p1, p2);
            // Get the light reflected to us by the mesh that was hit and add it to the total light received by the current mesh
            totalLight = totalLight + make_float3(uint_as_float(p0), uint_as_float(p1), uint_as_float(p2));
#ifdef TIMING
            params.profileCounters[REFLECTION_COUNTER][rayIndex] = params.profileCounters[REFLECTION_COUNTER][rayIndex] + 1;
#endif
        }
    }

    // Combine the texture color of our pixel with the total light received from all non-obscured lights by multiplying the color vectors. This gives us the actual light color for this pixel.
    // Light returned has to be adjusted to account for distance from current point to ray origin.
    float3 finalColor = color * totalLight;
    lightDistance = length(ourRayOrigin - currentRayOrigin);
    setIntersectPayload(RAY_HIT, clamp(finalColor / dot(normalize(triangleNormal), normalize(currentRayDirection)) / (12.56637061f * lightDistance * lightDistance), 0.0f, 1.0f));
#endif
}

I would guess that you’re not counting the optixTrace call for the shadow ray which is called unconditionally for each of your normal rays, so your optixTrace call recursion depth is always one bigger than your MAX_TRACE_DEPTH after you added that.

I didn’t think the loop generating shadow rays was running since I set params.numLights to zero, but I commented out those loops and tried again and my code still fails.

The closest hit and miss programs for shadow rays are simple and just return a boolean indicating hit or miss. They do not generate any additional rays.

I wouldn’t think they would effect the trace depth anyway, since I understand that when I call optixTrace, the code which called optixTrace does not get control back until after the generated ray is processed, and the ray is no longer on any stack or affecting depth of recursion.

Then what is the currentDepth value in payload register 0 when first entering the closest hit program?
If that is 0, you’re not counting the optixTrace call inside the ray generation program as level in your trace depth count and the stack and pipeline link options are too small.

That max trace depth value is simply the maximum number of nested trace calls.
E.g. when setting the maxTraceDepth to 0 you cannot call optixTrace inside the ray generation program.

The code excerpt as it is now will then still not work when numLights is actually having lights, because the optixTrace call for the shadow ray is not guarded against your max trace depth.

I finally resolved this. I was unintentionally using the same payload register to hold the ray recursion depth and a flag indicating whether the ray ended up as a hit or a miss. The closest hit and miss programs set the payload register as a hit or miss, clobbering the recursion depth counter.

Once I fixed the problem, there are no more crashes.

In trying to figure out what was wrong, I added a check if the current recursion depth was > maximum allowed depth and calling a CUDA printf and then additionally added a null pointer dereference to force a crash at that point.

On Linux, the CUDA printf buffer apparently is not flushed on either an Optix exception or on a null pointer dereference since I never saw the message telling me the max depth was exceeded. Also, if I ran with cuda-gdb I was not able to disassemble the code to get a clue where the program stopped since the disassemble and x commands told me the instruction address was not within any function.

I solved this by moving my code to Windows in the hope that debugging support was better. I found that I could at least set breakpoints on lines of code within my closest hit program and have the debugger trap on those lines.

Once I saw that the limit was being exceeded and didn’t understand why since I was supposedly ensuring I was within limits previously, I was able to figure out why this was always failing for me with my loop issuing 2 or more optixTrace calls.

I’m impressed by Optix, it’s just difficult to debug anything.