The OptiX code I’m involved in developing causes Turing GPUs (or at least RTX 2060 and 2070) to hang with 100% GPU usage, but only if printing is disabled. If enabling printing, it works fine. It also works fine on Pascal cards, with or without printing enabled.
The only rtPrintf statement in our programs is in an exception program, and exceptions are turned off when it hangs. With exceptions and printing turned on, I don’t get any indication of an exception being thrown.
The problem seems to depend on the number of if statements (or similar) in the code. Commenting out some of these makes the code run. Of course diverging code has an adverse result on performance, but I wouldn’t expect it to cause the GPU to hang entirely. Unfortunately I’m not able to reduce the branchiness enough without affecting the result, at least not for all our projects. In one of them I’m able to get it to work by rewriting an any hit program for shadow rays so that it doesn’t call rtTerminateRay, but that might cost performance on other architectures and I’m afraid it might break anyway as soon as some other feature is added.
Current driver version: 440.82 (also happened in 440.59), OS: OpenSUSE 15. All the OptiX samples run.
Is there a known problem with diverging code causing freezes on the Turing architecture? Any known workarounds or ideas on how to troubleshoot it further?