Undefined Behaviour Solved But Root Cause Unclear

During testing my OptiX7.5-based Pathtracer application (see app design details)
I solved some “Undefined Behavior” situations.
Always it took much more time, than expected to find the root cause; Validation Mode messages helped a lot, but in a specific case there was no crash, no validation error, no exception; instead on a random render frame, processing over and over again was in a loop within nvcuda64.dll, often it simply did not return from a cudaDeviceSynchronize call or optixLaunch call (see screenshots).
I still was not able to identifiy the “real” root cause for that, although I solved a global logical error, which simply avoids using that .cu kernel for that case. (that kernel was not required, cause it would have simulated an invisible “virtual” geometry; which simply is not being defined at all now, removing the GAS and IAS entry for it)
For now this problem is solved for me. And so my question here is more about, what could have been the underlying technical reason for such situations.

Its clear, when a kernel is not designed properly it may cause problems, but because there was no error message and no validation mode message, I had to search very long time for the root cause of the problem.
The kernel in question (a closesthit program) is designed to handle multiple cases; Somehow the driver seems to repeat something internally which then caused the frozen app state.
I cannot provide a minimal reproducer for this, because its happening in the complex app and I simply don’t know, what exactly went wrong in that kernel. The kernel works for all the other cases very well.
Generally rendering proceeds without problems, as long as the geometry update for that subset is not done; But if its done (on every final frame), then after a random frame (often 5, sometimes 27, or between or later) suddenly the renderer hangs at position shown in the screenshot)
The geometry update (updating vertex buffers and rebuilding GAS + IAS) works without problems on all other cases; The geometry in the failing case was a custom primitive; defined as sphere using a custom intersection program (not the new inbuilt-sphere-primitive!); That geometry works ok in all other cases, when used exactly with the closesthit .cu kernel. So its clear, it was an implementation problem, but its unclear to what the wrong code lead to, cause normally if invalid input data would be the reason, I would rather expect a crash or invalid visual output than a hang.
It also seems not to be a memory issue.

Is it possible for an employee to tell, what type of checkings are going on in the driver address = (RIP_register) - (module base address of nvcuda64.dll) from the screenshots?

So what is the driver attempting to do there?

VS2019 debugging can be paused and resumed again and again, and as you see, then processing is somewhere else in the driver (but the call stack is not changing completely, only the latest entries change; the stack remains unchanged beyond loaded address 0007ffaefef3d94h in DEBUG build case; after subtract module’s base address: 0007ffaefc40000h its RVA: 2b3d94h in .DLL image)

Before that driver version 516.59, I also had the same problem on an earlier driver,
So I could update to the newest driver to check again, but that did not help in the past.

Thank you!

My System:
OptiX 7.5.0 SDK
CUDA 11.7
GTX 1050 2GB
Win10PRO 64bit (version 21H1; build 19043.1237)
device driver: 516.59
VS2019 v16.11.17
MDL SDK 2020.1.2
Windows SDK 10.0.19041.0

If this is related to a specific *.cu module being present in your pipeline, then that could only be analyzed when having a reproducer in failing state.
There would not be enough information about what happens inside nvcuda64.dll from the stack address alone even if that would allow finding the source code location.

What is the expected performance of your renderer around that time?
Means is your framerate below 1 fps, that is, anywhere near the standard Windows Timeout Detection and Recovery (TDR) limit of 2 seconds?
If yes, does the behavior change if you make there render resolution a lot smaller?
It would also be interesting if this is graphics board related.

Also after updating to device driver: 517.48 still exact same issue at movie frame 71/100. = >355 total frames.

Accumulation frames take between 300-600 milliseconds. The Live-View Frame-Rate is about 2fps

I’ll send you a download link of the app in failing state as PM.