Segmentation fault during optixModuleCreate

Hi all

I used OptiX to create a scientific raytracing simulation for highly scattering media like porous ice.
I haven’t used the “Debug” mode in a while, since most of the work on the simulation is finished, and it runs fine when built in “Release”.

To investigate a 700: illegal memory access that seems to occur only under special circumstances, I wanted to rebuild the simulation in “Debug” and realised that a segfault occurs during program execution after optixModuleCreate is called. This does not occur in “Release”.

I use OptiX 8.0.0 with the Driver Version: 575.51.03 built with the CUDA Toolkit 12.4 under Linux.
The simulation has python bindings created using pybind11.

Below are the outputs from the program and the backtrace I got from cuda-gdb.
Note the numerous warnings concerning double calculation in curand i removed from the log.

Do you have any suggestions on how to further investigate this issue?

cuda-gdb_bt.txt (22.5 KB)

output.txt (3.0 KB)

Many thanks in advance
Rafael

UPDATE: I tried the same under Windows (Driver Version: 560.94, RTX2080 Ti, Toolkit 12.4) now.

Here, the debug launch works, and I get a useful error for the illegal memory access.
From what I understand, these are therefore unrelated issues.

Starting from the warning[ 3][ MEMORY]: cblCmdListBufferSize insufficient, attempting allocation, I found this forum entry: Intermittent hanging in debug mode only
However, my issue seems to be unrelated, since the two flags are set correctly:

#if !defined(NDEBUG)

    moduleCompileOptions.optLevel = OPTIX_COMPILE_OPTIMIZATION_LEVEL_0;

    moduleCompileOptions.debugLevel = OPTIX_COMPILE_DEBUG_LEVEL_FULL;

    std::cout << "Start optixModuleCreate" << std::endl;

#endif

Maybe you also have an intuition, how i could apprach this issue.
Is my hit_program too complex/large?

Here is the output:

[ 4][    COMPILER]: Function properties for __closesthit__ch_ptID_0x2153cd83293ffa55
register count                  :   128
direct stack size (bytes)       : 13736
direct spills (bytes)           :  4280
continuation stack size (bytes) :     0
continuation spills (bytes)     :     0

[ 4][    COMPILER]: Function properties for __miss__ms_ptID_0x2153cd83293ffa55
register count                  :   122
direct stack size (bytes)       :   120
direct spills (bytes)           :     0
continuation stack size (bytes) :     0
continuation spills (bytes)     :     0

[ 4][    COMPILER]: Info: Module Statistics
payload values        :         22
attribute values      :          0
Info: Properties for entry function “__miss__ms”
semantic type                :                   MISS
trace call(s)                :                      0
continuation callable call(s):                      0
basic block(s)               :                      4
instruction(s)               :                     40
Info: Properties for entry function “__closesthit__ch”
semantic type                :             CLOSESTHIT
trace call(s)                :                      0
continuation callable call(s):                      0
basic block(s)               :                    770
instruction(s)               :                  28196
Info: Compiled Module Summary
non-entry function(s):    54
basic block(s)       :   211
instruction(s)       :  7543

[ 4][    COMPILER]: Function properties for __raygen__rg_0x2153cd83293ffa55
register count                  :   128
direct stack size (bytes)       :  3272
direct spills (bytes)           :  3080
continuation stack size (bytes) :  1408
continuation spills (bytes)     :   896

[ 4][   DISKCACHE]: Inserted module in cache with key: ptx-1541392-keyfdeee5946a461611304e4e7cb3112cc2-sm_75-rtc1-drv560.94
[ 4][    COMPILER]: Info: Module Statistics
payload values        :         22
attribute values      :          0
Info: Properties for entry function “__raygen__rg”
semantic type                :                 RAYGEN
trace call(s)                :                      1
continuation callable call(s):                      0
basic block(s)               :                    709
instruction(s)               :                  14453
Info: Compiled Module Summary
non-entry function(s):     0
basic block(s)       :     0
instruction(s)       :     0

Finished optixModuleCreate
[ 4][   DISKCACHE]: Cache miss for key: ptx-93-key35203fa86dfc75ad1d747c239ee6fb67-sm_75-rtc1-drv560.94

[ 4][    COMPILER]:
[ 4][    COMPILER]: Function properties for __exception__default_0x42ada95ff871dbf3
register count                  :    27
direct stack size (bytes)       :     0
direct spills (bytes)           :     0
continuation stack size (bytes) :     0
continuation spills (bytes)     :     0

[ 4][   DISKCACHE]: Inserted module in cache with key: ptx-93-key35203fa86dfc75ad1d747c239ee6fb67-sm_75-rtc1-drv560.94
[ 4][    COMPILER]: Info: Module Statistics
payload values        :          0
attribute values      :          0
Info: Properties for entry function “__exception__default”
semantic type                :              EXCEPTION
trace call(s)                :                      0
continuation callable call(s):                      0
basic block(s)               :                     50
instruction(s)               :                    275
Info: Compiled Module Summary
non-entry function(s):     0
basic block(s)       :     0
instruction(s)       :     0

[ 4][    COMPILER]: Info: Pipeline statistics
module(s)                            :     1
entry function(s)                    :     3
trace call(s)                        :     1
continuation callable call(s)        :     0
direct callable call(s)              :     0
basic block(s) in entry functions    :  1483
instruction(s) in entry functions    : 42689
non-entry function(s)                :    54
basic block(s) in non-entry functions:   211
instruction(s) in non-entry functions:  7543
debug information                    :   yes

[ 3][      MEMORY]: cblCmdListBufferSize insufficient, attempting allocation
[ 3][      MEMORY]: cblCmdListBufferSize insufficient, attempting allocation
[ 2][       ERROR]: Error syncing stream (CUDA error string: an illegal memory access was encountered, CUDA error code: 700)
Error freeing CBL command list buffer (CUDA error string: an illegal memory access was encountered, CUDA error code: 700)
Error freeing CBL command list buffer (CUDA error string: an illegal memory access was encountered, CUDA error code: 700)
Error recording resource event on user stream (CUDA error string: an illegal memory access was encountered, CUDA error code: 700)
Error recording resource event on user stream (CUDA error string: an illegal memory access was encountered, CUDA error code: 700)
Error launching work to RTX
Error recording resource event on user stream (CUDA error string: an illegal memory access was encountered, CUDA error code: 700)


I’d recommend starting by enabling validation mode. This should work even on debug and release builds on Linux with the 575 driver. Validation mode will slow things down (so don’t leave it permanently turned on), and it will cause the OptiX launch to synchronize. Another way to force synchronization - without even needing to rebuild - is to set the environment variable CUDA_LAUNCH_BLOCKING=1. One of these options might help you verify whether there really is something wrong with CBL (which is an internal library we use for managing kernel launches).

You can also try a newer driver on Linux. There’s always a chance the CUDA toolkit version matters too. I hate suggesting things that take a lot of time to install and uninstall just to see if they work, but I also don’t want to leave useful triage paths out, so apologies in advance!

Another option is to install an old 560 driver and set OPTIX_FORCE_DEPRECATED_LAUNCHER=CUDA or OPTIX_FORCE_DEPRECATED_LAUNCHER=CBL1. That would also help validate or rule out if this is CBL-related. (Thanks to Kyle for the tip).

As far as whether your shader programs are too large - the size should not cause crashes, even if they’re large. Your compiler feedback is showing closest-hit and raygen have large stacks and a lot of spilled memory. This is typical of very large shaders, and there’s not always any good way to avoid this, but if you can trim down the stack & spills, it will likely help a lot with performance.

Hi David

Thanks for the quick reply!
I tested your suggestions both on Linux and Windows:

Linux

r575

Debug & Validation mode:
Still the same segmentation fault in optixModuleCreate (validation mode had no effect)

Release & Validation mode:

\[ 4\]\[    COMPILER\]: Info: Pipeline statistics
module(s)                            :     1
entry function(s)                    :     3
trace call(s)                        :     1
continuation callable call(s)        :     0
direct callable call(s)              :     0
basic block(s) in entry functions    :   322
instruction(s) in entry functions    :  7890
non-entry function(s)                :     0
basic block(s) in non-entry functions:     0
instruction(s) in non-entry functions:     0
debug information                    :    no

\[ 2\]\[VALIDATION_ERROR\]: \[TRAVERSABLE_GRAPH_DEPTH_EXCEEDED\] traversable graph depth exceeded during traversal
launch index: \[131, 0, 0\]
additional occurrences: 36
transform list:
size: 1
traversable handle:
handle: 0xe00072dd82aeca09
traversable type: instance transform
The traversal depth of the scene graph passed to an optixTrace call exceeds the maximum traversable graph depth. The maximum traversable graph depth set using optixPipelineSetStackSize.
\[ 2\]\[       ERROR\]: Error syncing stream (CUDA error string: unspecified launch failure, CUDA error code: 719)
Error recording resource event on user stream (CUDA error string: unspecified launch failure, CUDA error code: 719)
Error recording resource event on user stream (CUDA error string: unspecified launch failure, CUDA error code: 719)
Error launching work to RTX
Error recording resource event on user stream (CUDA error string: unspecified launch failure, CUDA error code: 719)
Simulation failed: /home/rafael/code/planetaryraytracer/gpu-raytracer/src/photonTracer/raytracing_pipeline.cpp(58): optixLaunch(pipeline\_, stream, dParam, sizeof(InputParameters), &sbt\_, params.numberOfRays, 1, 1) failed with error 7053 (OPTIX_ERROR_VALIDATION_FAILURE) Error during validation mode run

I don’t have any recursive optixTrace launches so this error surprises me.
We use an iterative approach with for loop in the __raygen__rg program.

r580

I updated the driver to r580 and used Cuda Toolkit 13.0 to compile the simulation. Both had no effect.

Windows

r560

Using validation mode and CUDA_LAUNCH_BLOCKING had no impact both in Debug and Release.

However the deprecated launcher had an impact:

No env variable set:

\[ 4\]\[    COMPILER\]: Info: Pipeline statistics
module(s)                            :     1
entry function(s)                    :     3
trace call(s)                        :     1
continuation callable call(s)        :     0
direct callable call(s)              :     0
basic block(s) in entry functions    :  1483
instruction(s) in entry functions    : 42689
non-entry function(s)                :    54
basic block(s) in non-entry functions:   211
instruction(s) in non-entry functions:  7543
debug information                    :   yes

\[ 3\]\[      MEMORY\]: cblCmdListBufferSize insufficient, attempting allocation
\[ 3\]\[      MEMORY\]: cblCmdListBufferSize insufficient, attempting allocation
\[ 2\]\[       ERROR\]: Error syncing stream (CUDA error string: an illegal memory access was encountered, CUDA error code: 700)
Error freeing CBL command list buffer (CUDA error string: an illegal memory access was encountered, CUDA error code: 700)
Error freeing CBL command list buffer (CUDA error string: an illegal memory access was encountered, CUDA error code: 700)
Error recording resource event on user stream (CUDA error string: an illegal memory access was encountered, CUDA error code: 700)
Error recording resource event on user stream (CUDA error string: an illegal memory access was encountered, CUDA error code: 700)
Error launching work to RTX
Error recording resource event on user stream (CUDA error string: an illegal memory access was encountered, CUDA error code: 700)

With: $env:OPTIX_FORCE_DEPRECATED_LAUNCHER = “CBL1”

\[ 4\]\[    COMPILER\]: Info: Pipeline statistics
module(s)                            :     1
entry function(s)                    :     3
trace call(s)                        :     1
continuation callable call(s)        :     0
direct callable call(s)              :     0
basic block(s) in entry functions    :  1483
instruction(s) in entry functions    : 42689
non-entry function(s)                :    54
basic block(s) in non-entry functions:   211
instruction(s) in non-entry functions:  7543
debug information                    :   yes

\[ 2\]\[       ERROR\]: Error syncing stream (CUDA error string: an illegal memory access was encountered, CUDA error code: 700)
Error launching work to RTX
Error recording resource event on user stream (CUDA error string: an illegal memory access was encountered, CUDA error code: 700)

Does this imply that the error is not related do the CBL library?

Do you think the segfault on Linux in optixModuleCreate in Debug and the memory errors might be related?

Kind regards
Rafael

After typing my answer above, i realized that i confused traceDepth and traversableGraphDepth in the pipeline creation. I set the traversableGraphDepth mistakenly to 1 even with a nested Geometry. Interestingly it only failed in a few cases after months of usage.

So good the news: The initial illegal memory access 700 is solved! Thanks a lot for the tip with the validation mode being available in Release. I only ever used it in Debug.

However, the issue with the segmentation fault during optixModuleCreate in Debug on Linux, as well as the memory errors shown above with Debug on Windows still persist.

Excellent triaging, and I’m glad the mem access on launch issue is solved. It sounds like validation isn’t helping with the module compile error, and it’s unfortunate that this occurs in Debug. If it’s possible, I think the ideal way forward is to send a reproducer if you can. You can send this privately via DM to me, or use the optix help mailing list. It would be sufficient to send only the inputs to optixModuleCreate, we don’t need to run the whole application. In fact, you can verify the inputs to module compile are crashing in Debug using the SDK sample called optixCompileWithTasks. You might have to fiddle with the payload and attributes arguments to get it to run on your input file. If and when that reproduces the Debug build of optixCompileWithTasks, you could send just the input file our way.

If sending the reproducing input isn’t viable for legal or policy reasons, we understand. In that case, it might still be worth trying to reproduce using optixCompileWithTasks, and then seeing if you can bisect or isolate the code that might be causing the problem. Delete a chunk at a time and test, and see how far you can narrow it down. That might give a clue as to what code feature is causing the compile crash.


David.

Hi @rafael.ottersberg,

I got your reproducer, thank you! I can reproduce an issue during compile, hopefully it’s the same issue. For some reason, it’s crashing due to lack of enough registers. I don’t think this is supposed to happen, so I’ve filed a bug report. In the mean time, I was able to get around the error by setting the module compile options’ maxRegisterCount variable to 140 or higher. Just wanted to post this publicly in case others run into a mystery compiler crash with debug flags enabled.

The lead times for compiler fixes have become longer than I’d like but I’ll post again when this is fixed and scheduled for release.


David.

Hi @dhart

Indeed this solved the issue for me as well.
Thanks for the outstanding support!

Best,
Rafael