OptiX 6.5 - Hanging Turing GPU unless enabling rtPrintf

The OptiX code I’m involved in developing causes Turing GPUs (or at least RTX 2060 and 2070) to hang with 100% GPU usage, but only if printing is disabled. If enabling printing, it works fine. It also works fine on Pascal cards, with or without printing enabled.

The only rtPrintf statement in our programs is in an exception program, and exceptions are turned off when it hangs. With exceptions and printing turned on, I don’t get any indication of an exception being thrown.

The problem seems to depend on the number of if statements (or similar) in the code. Commenting out some of these makes the code run. Of course diverging code has an adverse result on performance, but I wouldn’t expect it to cause the GPU to hang entirely. Unfortunately I’m not able to reduce the branchiness enough without affecting the result, at least not for all our projects. In one of them I’m able to get it to work by rewriting an any hit program for shadow rays so that it doesn’t call rtTerminateRay, but that might cost performance on other architectures and I’m afraid it might break anyway as soon as some other feature is added.

Current driver version: 440.82 (also happened in 440.59), OS: OpenSUSE 15. All the OptiX samples run.

Is there a known problem with diverging code causing freezes on the Turing architecture? Any known workarounds or ideas on how to troubleshoot it further?

The OptiX core implementation resides inside the driver since that OptiX 6.5.0 version, but because you’re already on the most current Linux drivers 440.82, there isn’t anything newer to test.

Which CUDA Toolkit version did you use to generate the PTX code?
Are you using GeometryTriangles primitives?

From your descriptions this sounds very much like a device code compilation issue.
That wouldn’t be possible to investigate further without a minimal complete reproducer in failing state.

Note that the code generation can change considerably with exceptions enabled, which supports the compilation problem thesis.

rtPrintf was notoriously buggy in some OptiX versions. You could try to use the CUDA native printf instead.
That doesn’t need any of the OptiX print enables. Limiting it to specific launch indices would need to be done manually.

With respect to the anyhit program, just in case, note that both the rtTerminateRay() and the rtIgnoreIntersection() functions immediately return. Make sure to update all your payload values before them.

Not calling rtTerminateRay() for a visibility test ray can have drastic performance implications depending on the scene.

Other things to test would be to check if all rays are valid. Means no NaN or INF values in the origin and directions, all directions normalized and never null-vector, 0.0f <= t_min < t_max.

1 Like

Thanks Detlef for your reply and suggestions!

The CUDA version is 10.1.243.
We do not use GeometryTriangles in our current implementation, but triangle meshes with intersection and BB programs.

The payload was updated before rtTerminate(). In the case of rtIgnoreIntersection() we don’t want to update it. I will try to reintroduce the rtTerminate() if I can get it to work.
I will check the rays - I think I’ve checked that the directions of the primary rays are normalized, but I’m not sure about the origins. I haven’t checked the shadow rays and reflection/refraction rays yet, so I’ll get working on that. Also I’ll try using the CUDA printf.

If none of this helps, I hope we can create a minimal reproducer next.

We do not use GeometryTriangles in our current implementation, but triangle meshes with intersection and BB programs.

Interesting, then you’re actually not using the RT core hardware triangle intersection on the RTX boards but only the hardware BVH traversal. Means there a huge potential to get more performance on the RTX boards.

The OptiX Advanced Samples on https://forums.developer.nvidia.com/t/optix-advanced-samples-on-github/48410 were written against OptiX 5.1.0 and would run similarly with OptiX 6.5.0 on RTX boards.

But if you say that all other SDK examples work, there is little to do about this from remote.
In any case, this shouldn’t happen when everything else is correct.
Long term I would recommend to port to OptiX 7 before changing the existing renderer more.

1 Like

Thanks, I’m aware of this and I’m sure we’ll want to fix it as soon as time allows. The same goes for porting to OptiX 7. We’ll probably try that before attempting to create a minimal reproducer. Checking the rays didn’t reveal anything strange, but we may need to check our intersection programs as long as we use them.