Debug-enabled very simple kernel weirdly crashes

shocker.0x15 · September 13, 2023, 6:04pm

Hello,

I met a kernel crash only when I create a module including that kernel for debugging.
I attached a reproducer for this issue:

Debug build creates the module for debugging, release build does not.

Weirdly, this issue started to happen after I modified the closest-hit kernel to very simple one. (Try to toggle the #if switch in optix_kernel.cu. The original kernel runs without the crash in my environment)
Non-debugging build works as expected.

Error reported at the crash is:

[ 2][       ERROR]: Error recording event to prevent concurrent launches on the same OptixPipeline (CUDA error string: unspecified launch failure, CUDA error code: 719)
Error recording resource event on user stream (CUDA error string: unspecified launch failure, CUDA error code: 719)

If I launch the program with NSIGHT VSE, the program stops during the RG kernel. At that time I can see “PhysicalStackOverflow” exception in the Lanes window.

Thanks,

Windows 11 Pro 22H2, 22621.2215
Visual Studio Community 2022, 17.7.2
OptiX 8.0.0
CUDA 12.2 Update 2
Ryzen 9 7950X
RTX 4080
Driver: 537.13

shocker.0x15 · September 21, 2023, 1:54pm

Hello OptiX dev team.
Could you take a look the repro?

Thanks,

droettger · September 22, 2023, 7:20am

I filed an internal bug report about it.

I couldn’t build the project myself, yet, because I don’t have CUDA 12.2 installed.
EDIT: I reproduced it now after installing CUDA 12.2 Update 2 which allowed to build the solution.

droettger · October 2, 2023, 8:58am

An analysis of the error revealed that this is due to an incorrect calculation of the direct stack space inside OptiX which should be fixed inside future R545 drivers.

As workaround you could either change the implementation of these two functions inside your code which were responsible for this case:

RT_DEVICE_FUNCTION T &operator[](uint2 idx)    <-- add RT_INLINE keyword
CUDA_COMMON_FUNCTION float3 operator*(const float3 &v)    <-- add CUDA_INLINE keyword

or increase the direct stack size in optixPipelineSetStackSize() until the crash goes away.

Personally I always use __forceinline__ __device__ on all my OptiX device functions which are not programs or callables.

shocker.0x15 · October 2, 2023, 2:51pm

Good to hear that the cause has been identified and will be fixed.

Personally I always use __forceinline__ __device__ on all my OptiX device functions which are not programs or callables.

I was thinking forceinline behavior is kind of the same as C++ inline.
So defining directly in the class declaration in the header file implicitly requires the function to be inline. But is this different?

droettger · October 2, 2023, 2:54pm

inline is just a hint for compilers. The compiler is free to not inline the code, for example, depending on the number of arguments or the size of the function body, which I have seen both happening in CUDA nvcc in the past. __forceinline__ is not a hint though.

shocker.0x15 · October 2, 2023, 3:13pm

Thanks for clarification.
I’ll close this issue.

system · October 16, 2023, 3:14pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Crash at app termination (nvcuda64.dll) OptiX	6	1748	January 29, 2024
Debugging is broken after updating to Cuda 12.1 OptiX	10	2339	May 26, 2023
Freeze on Sync after launch OptiX	9	1506	June 14, 2022
rtContextLaunch1D: unknown error OptiX	9	1782	June 14, 2022
Optix Error: OPTIX_ERROR_INVALID_INPUT OptiX	2	575	March 20, 2024
OptiX debugging with Nsight VSE OptiX	9	1329	June 14, 2022
Any working examples of OptixPrime using RTP_BUFFER_TYPE_CUDA_LINEAR? OptiX	3	937	June 14, 2022
simplest Optix code:unable to set CUDA device OptiX	9	3539	June 14, 2022
Linux Optix 7.4 Debugging problem OptiX	4	879	June 14, 2022
Optix device code returning ptxas error OptiX	5	1222	June 14, 2022

Debug-enabled very simple kernel weirdly crashes

Related topics