I am experiencing a very weird issue. To briefly describe my setup: from a ray generation program I am (through some function calls) making a trace call to a two-level AS with one level of instancing and a custom primitive. In the intersection program, I use the instance index of the hit primitive as an index to an array of structs. I then call a function and pass the found struct along. In this first function I pass a variable to a second function. In this second function, I check whether the value of this variable is still the same using a home-made assertion macro. The macro works by printing that an error has occurred and causing an interrupt by dereferencing a null pointer to terminate the program. The problem now is that the assertion fails. This should not be the case, since no modifications are made to this value between passing it and asserting it.
The interesting part now is that whether the assertion in second function fails or succeeds, depends on factors that should not contribute to the fact that it succeeds or not. For example, adding a single print statement in the first function or forcing the first function to not inline (with
__noinline__) results in the assertion succeeding. Otherwise, the assertion fails and the program thus crashes. My initial guess was that maybe somewhere data is written outside of bounds (and may write over the program code), but over the past two days I could not identify such an issue.
My problem looks very similar to this post. There, the suggestion is made that the cause of poster’s issue may be a compiler bug. A possible solution that is offered is to disable optimization and see if the issue still persists. I am using the CMake sample framework from the SDK, and have tried doing this by inserting the following two lines to the top-level
CMakeLists.txt, on line 210:
list(APPEND CUDA_NVCC_FLAGS -Xptxas -O0) list(APPEND CUDA_NVRTC_FLAGS -Xptxas -O0)
However, this did not seem to make a difference. I also made sure I cleaned the OptiX cache and reconfigured and regenerated CMake from scratch. My first question is: is this the correct way to disable optimization of device code compilation? I am not sure since the whole CMake setup of the SDK samples is quite involved.
My second question is: what (else) could cause such an issue? Like I said, I have not been able to find occurrences of memory being written outside of bounds, but are there any other problems that could cause something like this? I understand that my explanation of the situation is not great but I cannot reliably reduce the code down to a reproducible example (as removing code makes the crash disappear). I am also not at liberty to share the entire code (not that doing so would help, because relevant part is quite large).
By the way, this is on OptiX 7.3, CUDA 11.1, and on an RTX 2070 GPU.