I met a kernel crash only when I create a module including that kernel for debugging.
I attached a reproducer for this issue:
Debug build creates the module for debugging, release build does not.
Weirdly, this issue started to happen after I modified the closest-hit kernel to very simple one. (Try to toggle the #if switch in optix_kernel.cu. The original kernel runs without the crash in my environment)
Non-debugging build works as expected.
Error reported at the crash is:
[ 2][ ERROR]: Error recording event to prevent concurrent launches on the same OptixPipeline (CUDA error string: unspecified launch failure, CUDA error code: 719)
Error recording resource event on user stream (CUDA error string: unspecified launch failure, CUDA error code: 719)
If I launch the program with NSIGHT VSE, the program stops during the RG kernel. At that time I can see “PhysicalStackOverflow” exception in the Lanes window.
Thanks,
Windows 11 Pro 22H2, 22621.2215
Visual Studio Community 2022, 17.7.2
OptiX 8.0.0
CUDA 12.2 Update 2
Ryzen 9 7950X
RTX 4080
Driver: 537.13
I couldn’t build the project myself, yet, because I don’t have CUDA 12.2 installed.
EDIT: I reproduced it now after installing CUDA 12.2 Update 2 which allowed to build the solution.
An analysis of the error revealed that this is due to an incorrect calculation of the direct stack space inside OptiX which should be fixed inside future R545 drivers.
As workaround you could either change the implementation of these two functions inside your code which were responsible for this case:
Good to hear that the cause has been identified and will be fixed.
Personally I always use __forceinline__ __device__ on all my OptiX device functions which are not programs or callables.
I was thinking forceinline behavior is kind of the same as C++ inline.
So defining directly in the class declaration in the header file implicitly requires the function to be inline. But is this different?
inline is just a hint for compilers. The compiler is free to not inline the code, for example, depending on the number of arguments or the size of the function body, which I have seen both happening in CUDA nvcc in the past. __forceinline__ is not a hint though.