Nsight Compute + Optix 8 / Unsupported multi-level instancing detected for traversable handle

Our optix shader only takes 90% usage on GPU. Then I am trying to profile the bottleneck with Nsight compute.
However, it told me Unsupported multi-level instancing detected for traversable handle, so I cannot see the scene from root level IAS. Is that designed behavior or I need some extra config?

I also don’t have a clear idea on how to profile Optix with nsight. We can see some GAS, but not IAS. See other resources like OptixModule. But there seems no way to get the profile while auto profile button is enabled.

Hi @iaomw,

Where are you finding the 90% usage from? I would be careful about assuming that value is both accurate and represents exactly what you think. It might be accurate but different from what you’re expecting. For a typical kernel in Nsight Compute, 90% compute throughput is quite high. But keep in mind that Nsight tools will measure and report SM usage of code you write, and may not measure or report on OptiX or other driver internals, and therefore time taken in the driver may not be represented in your profiling metrics, which might mean that the GPU is 100% loaded even though you see stats that say 90%.

Is the multi-level instancing error coming from your application or from Nsight Compute? It sounds like an application error that you would see even if you run it directly in the console? What are your OptiX pipeline flags set to, and are you using a multi-level IAS scene?

I haven’t used the auto profile option in Nsight Compute. That will try to profile the BVH build and other things, which will not give you useful information. So I’d recommend manually profiling the optixLaunch kernel, or using a kernel regex to select optixLaunch. Once you have profiled the launch, you will be able to see details and per-instruction profiling data in your raygen and hit programs, and you will be able to find and inspect instruction stall reasons, as well as see data on cache hit rates, various SM pipeline throughput statistics, etc. This will help you pinpoint any bottlenecks. It may take some time to get used to Nsight Compute and understand how to identify your bottlenecks. It’s very rare that bottlenecks are either extremely easy to spot or extremely easy to solve, so have some patience and feel free to ask more questions or post images of profiling data.


David.

Thanks.

  • The 90% GPU usage simply come from Windows Task manager, may I trust it in general?
  • Unsupported multi-level instancing warning come from Nsight Compute while Open As Viewer. I can render the scene without any problem, but It’s not visible in Nsight. Yes, it’s indeed multi-level. Base on scene data, could be typical 2-level or multi-level.

What are your OptiX pipeline flags?

Well, we have OptixPipelineLinkOptions and OptixPipelineCompileOptions, only compile_options have some flags, I guess you mean it.

    pipeline_compile_options = {};
    pipeline_compile_options.traversableGraphFlags = OPTIX_TRAVERSABLE_GRAPH_FLAG_ALLOW_ANY; 
    pipeline_compile_options.usesMotionBlur        = false;
    pipeline_compile_options.numPayloadValues      = 2;
    pipeline_compile_options.numAttributeValues    = 2;
    pipeline_compile_options.pipelineLaunchParamsVariableName = "params";

    pipeline_compile_options.exceptionFlags = OPTIX_EXCEPTION_FLAG_STACK_OVERFLOW | OPTIX_EXCEPTION_FLAG_TRACE_DEPTH | OPTIX_EXCEPTION_FLAG_USER;
    pipeline_compile_options.usesPrimitiveTypeFlags = usesPrimitiveTypeFlags; // genenate by scene pass
    pipeline_compile_options.allowOpacityMicromaps = true;

There was permission problems last week. Administer right make trouble for me.

It works fine alone without Nsight. While launching it from Nsight with all users permission, it crashed at optixAccelCompact because outputBufferSizeInBytesis zero, maybe optixAccelComputeMemoryUsage give me wrong result running with Nsight.

Then I disabled compact, it crashed while build GAS with OMM. Right now, I didn’t use any multi-thread for building GAS / OMM, it works in order as buildOMM->buildGAS. I also don’t share OMM between GAS at current stage. OMM data is baked by CuOmmbaking, I guess I can trust it. I am using CudaMallocAsync, does it matter here?

[ 2][VALIDATION_ERROR]: [OMM_INDEX_OUT_OF_BOUNDS] Out-of-bounds OMM index encountered. This issue could be due to 1) specifying the wrong OMM base location 2) encoding the wrong index in the OMM index-buffer 3) specifying the wrong stride between indices 4) associating the wrong OMM index-buffer 5) having fewer OMMs (in the OMM array) than primitives (when using no OMM index-buffer)6) associating the wrong OMM Array (when using no OMM index-buffer)
    geometry index: 0
    primitive index: 0
    OMM index: 1
    OMM count: 0
    OMM base location: 0
    OMM index-buffer address: 0x320ef8c00
    current operation:
        type: build
        object type: BLAS
        object address: 0x31272fc80

[ 2][VALIDATION_ERROR]: [OMM_INCOMPLETE_IN_BLAS] Incomplete OMM-array encountered. Missing barrier? This could be due to 1) read-after-write hazard, e.g. building/accessing the BLAS before the OMM-array was ready (i.e. on-going build/copy/relocation), 2) write-after-read hazard, e.g. modifying the OMM array (or re-using its resource for another purpose) while building/accessing the BLAS
    contending operation on OMM-array:
        type: unspecified
    geometry index: 0
    OMM-array address: 0x31fa15200
    current operation:
        type: build
        object type: BLAS
        object address: 0x31272fc80

Hey @iaomw, I forgot to follow up until now, sorry for the slow response.

The 90% GPU usage simply come from Windows Task manager, may I trust it in general?

I would recommend not trusting the Windows Task manager. I don’t know what it’s reporting, and it does some time averaging that might cause confusing results. Nsight tools will be better, and even with Nsight tools you should pay close attention to metrics and be prepared for inconsistencies and complications.

Nsight Systems has GPU metrics data that can show you overall utilization. Nvidia-smi can also show you utilization, clock rates, and power metrics. If those match the Task Manager, then it becomes more trustworthy.

I didn’t quite understand the crash problem. Is it resolved now, or are you still having OMM failures during the build?

David.

@dhart How about the Unsupported multi-level instancing from nsight?

Hi, it could be solved easily by cudaDeviceSynchronize() before calling cuOmmBaking code. This problem didn’t happen running without Nsight.

For Optix itself, I guess it’s OK to accept buffer from cudamallocasync, while they all have CUstream parameter. Though, it’s not fully tested for async code.

If using cudaDeviceSynchronize() fixes the issue, it does suggest there’s an async problem somewhere, even if it only shows up sometimes. That’s the hard part of these kinds of issues, sometimes they appear to work. I believe it is okay to use a buffer from cudaMallocAsync in the same stream, but not in a different stream. You only get implicit synchronization for async work on the same stream. However, it probably won’t hurt to call cudaStreamSynchronize() anyway, just to be safe.

Is the multi-level instancing error coming from Nsight Compute, or Nsight Graphics?

David.

Thanks, it’s Nsight Compute 2025.1.1, I never try Nsight graphics for Optix.

It seems Nsight graphics already support Optix since 2021, however I didn’t see any “optix” in the menu of latest version. It do have a separate panel for ray tracing which is strange.

Nsight Graphics sees OptiX launches as just CUDA launches and doesn’t have any special knowledge about them. You’re not going to get a per-shader breakdown like you do in Nsight Compute. It can still be useful to see your OptiX launches on a frame timeline.

The panel for ray tracing in Nsight Graphics iirc is intended for use with DX and Vulkan which have ray tracing features as well. Not OptiX.