OPTIX_EXCEPTION_CODE_TRAVERSAL_INVALID_HIT_SBT root cause?

Hey,
in my OptiX7.6-based pathtracer (using the OptiX Apps architecture) I again got validation error OPTIX_EXCEPTION_CODE_TRAVERSAL_INVALID_HIT_SBT, but this time only on some materials, while most opaque and MDL materials work fine. Tested with separate settings on different launches.
To me it seems to be a problem somewhere in a volume material, but when I remove another material, the volume material at least does not cause the validation error.
I tried to increase MaxTraversalDepth (from 3 to 5) and pipeline_link_options.maxTraceDepth (from 3 to 5) without any success. In all cases I use 3 ray types: radiance, occlusion and scattering
The object uses in-built triangles, no motion-blur, no re-fitting, no cutout.

From the API Reference I found optixGetExceptionInvalidSbtOffset() and added it into the exception program but no output from there shows up. There is only one global “Exception program record”. Or do I need to add another exception program for the hit groups? I found no example wihtin the SDK.

I’ve seen SBT problem when using multiple GAS objects. - #2 by droettger and I use the index as described there:

sbt_idx=0
sbt_idx_occlusion=1
sbt_idx_scatter=2

sbt_idx=3
sbt_idx_occlusion=4
sbt_idx_scatter=5

sbt_idx=6
sbt_idx_occlusion=7
sbt_idx_scatter=8

sbt_idx=9
sbt_idx_occlusion=10
sbt_idx_scatter=11

count_hitgroup_records=12

while for test I even use pg_null for scattering. pg_null is built using optixProgramGroupCreate with all parameters zero.

In another (closed) case I got this validation error, when a pipeline setting related to curves was invalid. In the current test there are no curves; All objects are out of inbuilt-triangles.

My System:
OptiX 7.6.0 SDK
CUDA 11.8
GTX 1050 2GB
Win10PRO 64bit (version 22H2; build 19045.2846)
8GB RAM
device driver: 531.79
VS2019 v16.11.26
MDL SDK 2020.1.2
Windows SDK 10.0.19041.0

Hard to tell what is going on from that description.

  • When you say you got a validation error, do you mean you enabled OptiX ’ validation mode https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/MDL_renderer/src/Device.cpp#L286

  • The validation error did not report any additional information which SBT index it complained about?

  • What is the maximum traversal depth and maximum trace depth actually needed inside your scene?
    Always use the minimum values for both to reduce the required stack size.

  • Did you calculate the stack size explicitly? (That is obligatory when using callables!)

  • There is only one exception record per pipeline.
    Are you saying you do not catch that exception inside your exception program?
    If not, what happens with validation mode disabled?

  • If the problem moves or disappears when changing the number of materials strip down the scene to the smallest reproducing content.

  • Check all values inside the OptiX instances and Shader Binding Table for correctness (sbtOffset, SBT record headers, number of entries inside all SBT records, their stride, their alignment).

  • Try isolating which optixTrace call (which ray type) is responsible for the invalid SBT index error.
    My OptiX applications are not using three ray types, so you’re actually implementing a different device program architecture which might be the culprit if you’re having issues with volume materials.

  • Check all optixTrace calls for correct arguments.
    Use the formula in chapter 7.3 of the OptiX Programming Guide to check the effective SBT index by printing out all values from the instance and optixTrace call which affect the final SBT index, calculate that yourself, and print it before all optixTrace calls. Check if any of them is out of bounds.

  • Try newer display drivers.

  • Try newer OptiX SDK versions.

1 Like

Thank you for your answer.

I think I found the problem, but some issues still remain.

yes, I enabled OptiX validation mode:
OptixDeviceContextOptions.validationMode = OPTIX_DEVICE_CONTEXT_VALIDATION_MODE_ALL;

no sbt index reported, here the original message:
[ 2][ ERROR]: Validation mode caught builtin exception OPTIX_EXCEPTION_CODE_TRAVERSAL_INVALID_HIT_SBT
Error recording resource event on user stream (CUDA error string: unspecified launch failure, CUDA error code: 719)

printf correctly compiled into the PTX, but when launching this simple program, no output occurs:

extern "C" __global__ void __raygen__raygeneration()
{
  const uint2 launch_index = make_uint2(optixGetLaunchIndex());
  const uint2 theLaunchDim = { CURRENT_WIDTH, CURRENT_HEIGHT };
  printf(" Entering __raygen__raygeneration \n");

  return;
}

(Also no Validation Exception, although validation mode is ON)

when running without validation mode rendering works without crash;

the volume objects did not show up correctly; but that is solved now:

obivously there was a “local IAS” setup, which is needed for motion blur and for volume scattering:

sbtOffset = index * ray_count;  // this is now always zero for the "local IAS"
optix_instances.sbtOffset = sbtOffset;

Generally index * ray_count works for the other cases, but the local IAS is one level under the IAS.
I’ve set it to zero now. (optix_instances.sbtOffset = 0;) Now the validation error is gone.

The “local IAS” function worked properly up to driver 512.15 OptiX 7.4 giving the correct output (latest screenshot from May 2022), although it had that invalid SBT index.
But when I now run the exectuables from that time, the old version on the newer driver, it does not render anything at all anymore.

However, the local IAS is used to start traverval on multi-scattering volumes only against the volume object itself (not the whole scene).

I have set maxTraversalDepth = 3;
to allow IAS+MT+GASes
or IAS+IAS+GASes (volumes)
or IAS+GASes (no motion blur)

For MDL materials the validation error OPTIX_EXCEPTION_CODE_CALLABLE_PARAMETER_MISMATCH occured then, but also solved now

Morphing again also works fine for MDL material objects.

source code for the function deleted in this post now; since it seems to be solved

printf correctly compiled into the PTX, but when launching this simple program, no output occurs:

That’s a defect inside the R530 drivers which is fixed in R535 drivers.
Please read this thread and the link to the more general debugging topic in there.
https://forums.developer.nvidia.com/t/printf-not-working-in-optix-kernel/246586

obivously there was a “local IAS” setup, which is needed for motion blur and for volume scattering:

sbtOffset = index * ray_count;  // this is now always zero for the "local IAS"
optix_instances.sbtOffset = sbtOffset;

Generally index * ray_count works for the other cases, but the local IAS is one level under the IAS.
I’ve set it to zero now. (optix_instances.sbtOffset = 0;) Now the validation error is gone.

So you’re saying that you’re using multi-level acceleration structures.

Please study the instance and SBT setup described inside the OptiX Programming Guide chapter 7.3 covering different cases and esp. 7.3.5 Example SBT for a scene
That explains which sbtOffset needs to be used for the instances.
This also shows GAS with more than one build-input and more than one SBT record for one of the build inputs.
That the instance1 in that example starts at sbtOffset 6 is what I meant with prefix sum in your previous thread about GAS memory consumption.

Mind that the instance offset of the bottom-most instance is affecting the final SBT index selecting the hit record. If you used sbtOffset = index * ray_count; for the local IAS and index is the number of instances in your top-level IAS, and your SBT only contains as many hit records (times ray types) as there are top-level instances (which wasn’t shown inside your code excerpts), then that is obviously accessing the SBT out of bounds.

I’m not using an SBT hit record per instance in my later OptiX 7 examples (rtigo10 and MDL_renderer) anymore to keep the SBT small. Instead I use one SBT hit record per material shader in rtigo10 and and only five hit records inside the MDL_renderer, four variants from the product of (no emission, emission) x (no cutout, cutout)closesthit and anyhit programs and one for cubic B-spline curves. (The no-emission ones are for performance optimization.)
All additional data is accessed via the user-defined instanceId field. That indexes into an array of GeometryInstanceData structures which define the necessary IDs for the material, light, and object, and the device pointers to the vertex attributes and primitive indices.
Means the instance sbtOffset only select the hit record, the instanceId picks the material and light callables and their parameters and the geometry information used inside the hit programs.
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/MDL_renderer/src/Device.cpp#L1589
The optixTrace arguments switch just the ray type, as usual.

Thank you very much for your answer.
I’ve seen the threads related to printf before, but I thought that issue was only related to CUDA 12, not to the driver. OK, now I know.

I have always one build-input in any IAS and so this prefix sum for one instance is the number of ray types: (index * ray_count). Since there is no second “local” instance, the sbtOffset is 0 for the “local IAS”. I simply somewhow missed, that the “local IAS” is on a lower hierarchy level.
Thanks again for the clarification.

Your “keeping the SBT small” architecture approach is brilliant, when it comes to SBT clarity and obviously also speed advantages when rebuilding such smaller SBT’s.
For calculating motion differences I have already a global SubsetDefinition struct buffer, but only for face indices and a current/previous vertex buffer, I’m simply for now adding the material id buffer into it, then I have something similar to your “GeometryInstanceData” array. Yet I still re-use an subset id passed through the sbt-record, changing that needs more refactoring.
I didn’t know, that the instanceId field is user-defined. I thought that must exactly fit the index value within the SBT.
Currently for now I will keep the implementation as is, since its working ;)
Only adding a face-based (instead of subset-based) material id; But I think I’ll also try to apply your new architecture in my renderer in the future.

Right, there are three useful things inside the OptixInstance which can be used for different tricks.
The sbtOffset affects the SBT hit record used via that SBT index formula directly, the instanceId can be anything you need, and there is also the zero-based instance index which comes from the order in which the instances have been put into its IAS.
Check all the optixGetInstance* device functions, here optixGetInstanceId and optixGetInstanceIndex.

1 Like

in the OptiX SDK curve.h functions (for example optixGetCatmullRomVertexData()) is shown to take a “gasSbtIndex” parameter,
obtained from optixGetSbtGASIndex()
when I now would only put one curve material into the SBT, which then would be used by more than one instance in the IAS, how would I get the associated vertex data, if using your archtitecture with the instanceId?
So for curves it seems all instances need to be still present within the SBT, right?

So for curves it seems all instances need to be still present within the SBT, right?

No, not in my MDL_renderer architecture.

when I now would only put one curve material into the SBT, which then would be used by more than one instance in the IAS, how would I get the associated vertex data, if using your archtitecture with the instanceId ?

First, I’m not using OPTIX_BUILD_FLAG_ALLOW_RANDOM_VERTEX_ACCESS because that makes the acceleration structure bigger and slower, which means I cannot call any of the functions fetching vertex data from the GAS (optixGetTriangleVertexData, optixGetLinearCurveVertexData, optixGetQuadraticBSplineVertexData, optixGetCubicBSplineVertexData, optixGetCatmullRomVertexData, optixGetSphereData).

Instead I store all vertex attributes and indices into device memory (where it’s needed for the AS build anyway) and access it via the instanceId by indexing into an array of a small custom GeometryInstanceData structure as described above with the link to its definition:

struct GeometryInstanceData
{
  // 16 byte alignment
  // Pack the different IDs into a single int4 to load them vectorized.
  int4 ids; // .x = idMaterial, .y = idLight, .z = idObject, .w = pad
  // 8 byte alignment
  // Using CUdeviceptr here to be able to handle different attribute and index formats.
  CUdeviceptr attributes;
  CUdeviceptr indices;
};

The ids in that struct allow assigning different materials to different instances of the same GAS. In that case the attributes and indices pointers of that GAS are reused, so there is only one GAS built for instanced geometries and that’s assigned to all instances’ traversableHandle field using that GAS, as usual.

These GeometryInstanceData structures are accessed inside my closest hit programs like this:
GeometryInstanceData theData = sysData.geometryInstanceData[optixGetInstanceId()];
and because triangles and curves use different closesthit programs, they know how to interpret the data behind the generic attributes and indices pointers accordingly, for triangles:
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/MDL_renderer/shaders/hit.cu#L100
and for cubic B-spline curves:
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/MDL_renderer/shaders/hit.cu#L1307

That way materials and geometries are decoupled and the SBT only needs to store hit records per material shader.

If you look at the OptiX SDK examples, you also need to follow where the optixGetCatmullRomVertexData function is used there and that shows that it calls optixGetSbtGASIndex to get the gasSbtIndex inside functions called by the closesthit programs and that should always return the GAS index of the currently intersected primitive. That is usually zero when there is only one build input

Then if you read the chapter about the differences between curves, spheres, and triangles you’ll find that curves cannot have SBT index buffer, means no different SBT records per primitive, so only curve GAS with more than one build input can have gasSbtIndex values greater than 0.
That doesn’t happen in my MDL_renderer implementation. Is that a use case in your renderer implementation?

1 Like

I don’t use instancing at all, cause bone+shapes animation would be much slower, when not pre-calculated. And normally I never needed instancing of an exact identical mesh.
So for me I go with:
const unsigned int instanceId = valid_subset ? subset_id : gi.getLight_ID();
which simply reduces the complexity to an object subset array and a light definition array. (emissive objects are simply subsets, which only are seen as subsets (e.g. triangle-objects) during intersection and seen as lights for light sampling). Only subsets have materials, which are indexed from the subset data itself, saving the need of another buffer.
However, I think I could use instancing of the same GAS even this way, cause it then would simply have another entry in the IAS, with a different transform an a different instanceId linking to a “virtual subset” redirecting to an exisiting subset for geometry data.
Currently I got the light sources which use sphere/ellipsoid/box/cylinder geometry working with the new instance id referencing.

So when sharing the same material in one SBT entry also works for a curve type (I normally only use catmull rom), in the future, I’ll change the vertex data fetching for curves to your way of doing it.

No, I always only use one build input anywhere. So I can use gasSbtIndex == 0 there always avoiding the call to optixGetSbtGASIndex. as long as I still yet use optixGetCatmullRomVertexData Thank you for pointing that out!

After implementing your idea for curves, from my older post I recognized @dhart’s answer, where he pointed out, that keeping that vertex buffer causes the data to be twice in memory. So using the vertex functions have a real advantage related to the memory usage.
Especially when realizing that the thickness data is stored in a separate buffer, which also needs to be present, keeping all that buffers additionally is costly.
However, when using temporal denoiser I still need both buffers to calculate the difference between current curve hitpoint and previous hitpoint; but in case the temporal denoiser is not used, calling the vertex functions saves that memory.

So the only difference would be the OPTIX_BUILD_FLAG_ALLOW_RANDOM_VERTEX_ACCESS flag, about you said

Is there a general assumption how much bigger they are? Would that about compensate the vertex / thickness / index data?

not using the vertex functions give me a slightly better speed, but in my test (no temporal denoiser) I get exactly the same memory sizes:

not using OPTIX_BUILD_FLAG_ALLOW_RANDOM_VERTEX_ACCESS flag:
(using vertex / index / thickness buffers directly for calculating normals)

accel_options.buildFlags= OPTIX_BUILD_FLAG_ALLOW_COMPACTION | OPTIX_BUILD_FLAG_PREFER_FAST_TRACE
accel_options.operation= OPTIX_BUILD_OPERATION_BUILD
curveType =9476   (OPTIX_PRIMITIVE_TYPE_ROUND_CATMULLROM)
g->curve_primitive_count=749446
compacted_gas_size=224422500
gas_buffer_sizes.outputSizeInBytes=256241764
=> select compacted
cudaMemGetInfo => free_gpu_mem=21596800h (~533.6mb)
Frame Time=  70msec - 72msec

using OPTIX_BUILD_FLAG_ALLOW_RANDOM_VERTEX_ACCESS flag:
(using optix vertex functions for calculating normals)

accel_options.buildFlags= OPTIX_BUILD_FLAG_ALLOW_RANDOM_VERTEX_ACCESS | OPTIX_BUILD_FLAG_ALLOW_COMPACTION | OPTIX_BUILD_FLAG_PREFER_FAST_TRACE
accel_options.operation= OPTIX_BUILD_OPERATION_BUILD
curveType =9476   (OPTIX_PRIMITIVE_TYPE_ROUND_CATMULLROM)
g->curve_primitive_count=749446
compacted_gas_size=224422500
gas_buffer_sizes.outputSizeInBytes=256241764
=> select compacted
cudaMemGetInfo => free_gpu_mem=22796800h (~551.6mb) 
Frame Time=  72msec - 74msec

.

when using the temporal denoiser (incl flow vector calculation for cuves, albedo and normals):

 no vertex functions for calculating normals: 152msec - 156msec per frame
  (free_gpu_mem=1e796800h (~487.6mb)  )
vertex functions for calculating normals: 157msec - 160msec per frame 
  (free_gpu_mem=1e796800h (~487.6mb)  )

frame times may be compromised by other applications.

I did these tests several times and noticed, that in some early tests I had a buildflag mismatch: OPTIX_BUILD_FLAG_ALLOW_RANDOM_VERTEX_ACCESS was not set in the OptixBuiltinISOptions.buildFlags, but is was set in OptixAccelBuildOptions.buildFlags; no valdiation error occured, no different output.

all still on driver 531.79

since the temporal denoiser will be in use nearly always, I simply use the direct access without the vertex functions.

Is there a general assumption how much bigger they are? Would that about compensate the vertex / thickness / index data?

I have no measurements about how that affects curve primitives. It’s interesting that adding the OPTIX_BUILD_FLAG_ALLOW_RANDOM_VERTEX_ACCESS doesn’t change the compacted curve GAS size on your system configuration.

That was more a comment about built-in triangles where that is also affected by the underlying hardware. RTX GPUs will show differences for those. Your Pascal GPU will behave differently for built-in triangles.

I’m also especially not using OPTIX_BUILD_FLAG_ALLOW_RANDOM_VERTEX_ACCESS and OPTIX_BUILD_FLAG_PREFER_FAST_TRACE flags in my multi-GPU examples when sharing GAS across NVLINK bridges because the GAS size affects the performance more then.

Note that the geometry acceleration structure size is mainly affected by the different AS optimization settings.
That also includes curve primitives: https://raytracing-docs.nvidia.com/optix7/guide/index.html#curves#splitting-curve-segments
You’re using OPTIX_BUILD_FLAG_PREFER_FAST_TRACE which usually results in bigger AS sizes.

The same advice for AS comparisons as in this post apply for each different GPU architecture:
https://forums.developer.nvidia.com/t/build-time-and-bvh-size-for-different-primitive-types/251639/7

1 Like

I have finally fully implemented your new architecture using only one material entry per actually used material in the SBT and then indexing into them from IAS entries.
I removed OPTIX_BUILD_FLAG_ALLOW_RANDOM_VERTEX_ACCESS to have speed improvements for built-in triangles.
All works now

Thank you very much!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.