Segfault on call to Optix Trace

The ptx code works on two other computers, but on a slightly different setup and Optix version it crashes.

The cuda error is:

terminating due to uncaught exception of type CudaException: CUDA error on synchronize with error 'an illegal memory access was encountered'

Here is the cuda code:


        uint p0_1 = (F32_asuint(((&ctrl_pt_3)->t_1)));
        uint p1_1 = (F32_asuint(((&ctrl_pt_3)->dirac_0.x)));
        uint p2_1 = (F32_asuint(((&ctrl_pt_3)->dirac_0.y)));
        uint p3_1 = (F32_asuint(((&ctrl_pt_3)->dirac_0.z)));
        uint p4_1 = (F32_asuint(((&ctrl_pt_3)->dirac_0.w)));
        uint tri_1 = (F32_asuint(((U32_asfloat((last_tri_1))))));
        float _temp1_1 = {};
        uint p6_1 = (F32_asuint((_temp1_1)));
        float _temp2_1 = {};
        uint p7_1 = (F32_asuint((_temp2_1)));

        optixTrace(
                (globalParams_0->traversable_0),
                (origin_0),
                (direction_1),
                ((F32_abs((prev_t_1)))),
                (globalParams_0->tmax_0),
                0.0f,                // rayTime
                OptixVisibilityMask( 255 ),
                OPTIX_RAY_FLAG_NONE,
                0,                   // SBT offset
                0,                   // SBT stride
                0,                   // missSBTIndex
                (p0_1), (p1_1), (p2_1), (p3_1), (p4_1), (tri_1), (p6_1), (p7_1)); // payload

I have AB tested removing the code right before the optixTrace to see if the segfault occurs there, but it does not seem to.

I did notice this strange asm code that the cuda code is generating.

	call(%r72,%r73,%r74,%r75,%r76,%r77,%r78,%r79,%r80,%r81,%r82,%r83,%r84,%r85,%r86,%r87,%r88,%r89,%r90,%r91,%r92,%r93,%r94,%r95,%r96,%r97,%r98,%r99,%r100,%r101,%r102,%r103),_optix_trace_typed_32,(%r239,%rd142,%f30,%f31,%f32,%f1,%f2,%f3,%f292,%f701,%f702,%r105,%r239,%r239,%r239,%r239,%r110,%r111,%r252,%r251,%r250,%r249,%r254,%r239,%r239,%r239,%r239,%r239,%r239,%r239,%r239,%r239,%r239,%r239,%r239,%r239,%r239,%r239,%r239,%r239,%r239,%r239,%r239,%r239,%r239,%r239,%r239,%r239,%r239);

I’ve tried commenting out the closest hit, anyhit, and miss programs, and I still get the error. Maybe the issue could be that the pointers to memory for the GAS are NULL? Would that only show up when I call optixTrace?

On the working computer:

[ 4][    COMPILER]: Function properties for __anyhit__ah_ptID_0x3dba565920a30309
	register count                  :   102
	direct stack size (bytes)       :     0
	direct spills (bytes)           :     0
	continuation stack size (bytes) :     0
	continuation spills (bytes)     :     0

[ 4][    COMPILER]: Function properties for __closesthit__ch_ptID_0x3dba565920a30309
	register count                  :   102
	direct stack size (bytes)       :     0
	direct spills (bytes)           :     0
	continuation stack size (bytes) :     0
	continuation spills (bytes)     :     0

[ 4][    COMPILER]: Function properties for __miss__ms_ptID_0x3dba565920a30309
	register count                  :   102
	direct stack size (bytes)       :     0
	direct spills (bytes)           :     0
	continuation stack size (bytes) :     0
	continuation spills (bytes)     :     0
[ 4][    COMPILER]: Function properties for __raygen__rg_float_0x3dba565920a30309
	register count                  :   128
	direct stack size (bytes)       :    88
	direct spills (bytes)           :    80
	continuation stack size (bytes) :   352
	continuation spills (bytes)     :   368

On the failing computer:

[ 4][COMPILE FEEDBACK]: Info: Function properties for __anyhit__ah_ptID_0x8f7a0729eb2f1cb0
        Used 112 registers, 56 bytes stack size, 0 bytes spilled

[ 4][COMPILE FEEDBACK]: Info: Function properties for __closesthit__ch_ptID_0x8f7a0729eb2f1cb0
        Used 113 registers, 56 bytes stack size, 0 bytes spilled

[ 4][COMPILE FEEDBACK]: Info: Function properties for __miss__ms_ptID_0x8f7a0729eb2f1cb0
        Used 112 registers, 48 bytes stack size, 0 bytes spilled

[ 4][COMPILE FEEDBACK]: Info: Function properties for __raygen__rg_0x8f7a0729eb2f1cb0
        Used 128 registers, 632 bytes stack size, 0 bytes spilled

[ 4][COMPILE FEEDBACK]: Info: Function properties for __raygen__rg_float_0x8f7a0729eb2f1cb0
        Used 128 registers, 720 bytes stack size, 0 bytes spilled

It seems to read the ptx code differently?

Optix Version on working computer: 7.6.
Optix Version on non-working computer: 7.4. I cannot change the version.

Are these the same machine configurations as here: https://forums.developer.nvidia.com/t/segfault-when-using-printf-in-closesthit-shader/283172 ?

The code excerpts you posted won’t help solving this, especially not when the program works on some machines and not on others. You would first need to determine the exact differences between these machines.

When filing bug reports like this, the required information to begin analysis are the following:

  • OS version
  • Installed GPU(s)
  • VRAM amount
  • Display driver version
  • OptiX major.minor.micro version
  • CUDA toolkit version used to generate the module input code (PTX or OptiX-IR?)
  • Host compiler version

All of these are mandatory for both the working and non-working machines.

Optix Version on non-working computer: 7.4. I cannot change the version.

Since the OptiX implementation resides inside the display driver, the exact display driver version is absolutely crucial information.

The first thing to verify is, if the systems use different display driver versions.

If the failing machine uses an older one, update it to the same display driver version running on the working machines.

If the error goes away, it was most likely some CUDA or ray tracing driver issue which was fixed in newer display drives.

If the display drivers are identical and this is really only happening when building the application against the OptiX SDK 7.4.0 version and not with the OptiX SDK 7.6.0 version, and you cannot change the OptiX SDK version (Why? Because you cannot update the display driver?), there could still be the chance of some error inside the CUDA or OptiX host or OptiX device code.
Things like not default-initializing all OptiX structures could result in different behaviors among OptiX SDK versions, but that would usually break on the newer ones which have added new fields to existing structures.

I’ve tried commenting out the closest hit, anyhit, and miss programs, and I still get the error.

So the issue happens with the ray generation program already?

For debugging this on your side, have you enabled the OptixDeviceContextOptions validation mode, set a logger callback and set the debug level to the maximum value 4? Example code here

Have you implemented an exception program which dumps OptiX exception information? (The validation mode will add one internally if not.)

Since the error is about an illegal memory access, have you verified that all CUDA device pointers you use are valid during an optixLaunch?

Maybe the issue could be that the pointers to memory for the GAS are NULL?

Sorry, what?! Are you saying you used nullptr for GAS inside the OptiX render graph or are you only guessing that this could be an issue?

If you really use nullptr for GAS device pointers, don’t do that! You wouldn’t have a valid traversable handle for that either. That doesn’t make any sense in an OptiX render graph. Just leave empty GAS away when building the top-level IAS.

Well, the traversable handle used inside the optixTrace call is allowed to be NULL. That will immediately invoke the miss program.

Would that only show up when I call optixTrace?

Most likely. If you do not traverse the acceleration structures, they wouldn’t need to be accessed.

What are you actually implementing?
(Also why are you using own functions like F32_asuint() instead of the CUDA built-in device functions __float_as_uint() and __uint_as_float() which do the same?)

I did notice this strange asm code that the cuda code is generating.

Yes, because all OptiX device functions are inline assembly code replaced by the OptiX internal compiler.
Please have a look into the optix_device_impl.h header.

At this point, there is not enough information to even begin analyzing your problem remotely.

If you could provide a minimal and complete reproducer project in failing state including source code, it might be possible to investigate this further.

The GAS is not NULL, I checked to be sure.
The code is being auto generated by slang, so that’s why it uses weird built in functions.
I’m using PTX embedded as a string using bin2c because this is a python extension.

Working Computer specs:
GTX3090, Driver 550.54.14, CUDA 12.4. I’ve tested Optix 7.* on this machine and it works fine.
OS: Manjaro.

Non-working computer specs
GTX4090, Driver 525.147.05, CUDA 12.0. I believe the Optix version is 7.4.
I can’t update the driver version.
nvcc: Build cuda_12.3.r12.3/compiler.33492891_0 ??? I’m not actually sure what is being used to build the program.
OS: Debian based

The build setup is rather difficult on the second setup, as you might be able to tell. It’s certainly the fault of the second setup, which I have no ability to change. However, is there some way I can work around that?

The first thing to try is downgrading CUDA on the non-working machine from 12.0 to 11.4. Note that OptiX 7.4 was released against CUDA Toolkit 11.4, and OptiX 7.6 was released against CUDA Toolkit 11.8. While it should work in theory to use a CUDA 12 Toolkit, you should first try using the toolkit that matches the OptiX SDK. Jumping to the latest CUDA toolkit that is ahead of any OptiX releases could expose yourself to instability and issues we haven’t yet been able to test or address. (Even OptiX 8.0’s recommended CTK is version 12.0).

Please also try to reduce the number of things changed in the setup of your 2 machines. Between these 2 machines, almost everything is different – different GPUs, different versions of OptiX, different drivers, different versions of CUDA, and even different OSes. There are too many things that could go wrong, and no way for us to even start speculating without a reproducer.

If you can install CUDA 11.4 and a 525 driver on both machines, and then test OptiX 7.4 vs 7.6, that would at least narrow things down a little. If you can’t change things on one of these machines, it might be worth doing the tests on the one machine you can change, even if it means spending time installing and re-installing things. Getting it to fail on your currently working computer would be useful.

There are still some more triage ideas that Detlef suggested too, but given the use of slang and bin2c I would agree with Detlef that a reproducer might be necessary in order to make any progress here.


David.