OptiX 6.0: "RTX acceleration is supported on Maxwell and newer GPUs "

The release notes for OptiX 6.0.0 reads as follows:

“RTX acceleration is supported on Maxwell and newer GPUs but require Turing GPUs for RT Core acceleration”

I’m not sure I understand the part about “Maxwell and newer GPUs”. I thought RTX acceleration wasn’t available on GPUs older than Turing.

Yeah, that’s formulated confusingly.

What it means is the new “RTX execution strategy” in OptiX, which is just a name for the new core OptiX 6.0.0 implementation. It’s in contrast to the “mega-kernel execution strategy” used so far in all previous OptiX versions.

That new RTX execution strategy allows to use the RT cores inside the Turing RTX GPUs for BVH traversal and triangle intersection.
On GPUs without RT cores the bulk of that new code path will still be used for shader compilation, acceleration structure builds, multi-GPU support, etc. but the BVH traversal and triangle intersection routines run on the streaming multi-processors as before.

To invoke the hardware triangle intersection routine or a fast built-in routine on boards without RT cores, you’d need to change your OptiX application to use the new GeometryTriangles and attribute programs.
The GeometryTriangles neither have a bounding box nor an intersection program, but they have a new attribute program which allows to fill in your developer defined attribute variables, so that you can use the same any hit and closest hit programs from custom primitives in Geometry nodes as well as from GeometryTriangles.

Unfortunately the OptiX Programming Guide is slightly behind on explaining that. The online version hopefully gets updated accordingly soon.
The OptiX API Reference document and of course the headers contain the necessary function explanations and the optixGeometryTriangles example demonstrates the new GeometryTriangles and attribute program usage.

Thanks for explaining, Detlef! Just to make sure, I’m doing it as follows:

// Create an OptiX context
  g_context = optix::Context::create();

  // Set RTX global attribute
  const int RTX = true;
  if (rtGlobalSetAttribute(RT_GLOBAL_ATTRIBUTE_ENABLE_RTX, sizeof(RTX), &RTX) != RT_SUCCESS)
    printf("Error setting RTX mode. \n");
    printf("OptiX RTX execution mode is %s.\n", (RTX) ? "on" : "off");

  // Set other attribute variables
  g_context->setStackSize( 5000 ); 

// ...

a) Is it correct to do it right after instancing the context, like I did?
b) I suppose turning the RTX mode on is recommended whenever possible, correct?
c) When using the C++ interface, is there any other way to turn it on besides using the rtGlobalSetAttribute function? I tried to take a look at the programming guide from the SDK folder but I didn’t find anything.
d) Since you mentioned the new GeometryTriangles, is it supposed to be faster than, say, the old sutil/triangle_mesh.cu approach?
e) Are there any plans to add support to Selectors and visitor programs in the RTX mode in the future?

a) No, you should set RT_GLOBAL_ATTRIBUTE_ENABLE_RTX global attribute before calling optix::Context::create();
(Also note that there is a new stack size API which allows to set the maximum number of recursions you expect instead of the old byte based stack size which isn’t used anymore then.)

b) Yes, that RTX execution strategy should become the default and is also faster for boards without RT cores in many areas.
EDIT: There is no need to enable the RTX execution strategy in newer drivers anymore.

c) No, since rtGlobalSetAttribute is a function which is not tied to an OptiX object, there is no C++ wrapper for that.
Similar to all rtDevice* functions which you can call before creating a context. Example code here:

d) Yes, and you must use the new GeometryTriangles to invoke the Turing RT core triangle intersection hardware.
GPUs without RT cores can become faster that way as well, because evaluation of the attributes in the attribute program will happen deferred, where it might not do that inside the intersection program(s) and that is the most often called program for custom geometric primitives.

e) No, that would affect traversal performance too much. It always did but with hardware traversal that’s uncanny. The workaround is to change the scene hierarchy and rebuild, which also got faster. Also there are a few ray masks which allow to do some visibility and culling methods.

Regarding deferred evaluation of the attributes. Does that mean that for GPUs without RT cores, the performance boost can only manifest if the processed geometry does not have an “any-hit” program associated with it?

To my understanding, in “mega-kernel execution strategy”, attribute evaluation takes place for every successfully intersected triangle. Then if the geometry has an “any-hit” program associated with it, the performed computations are potentially not wasteful. Similarly, in the RTX execution strategy attribute evaluation will take place for every intersected triangle, provided it has an “any-hit” program associated with it. As a result, the performance is the same for both strategies.

Conversely, if there is no “any-hit” program, then in “mega-kernel execution strategy”, attribute evaluation is performed for naught. Whereas RTX strategy will only perform one necessary attribute evaluation, i.e. for the “closest-hit” program. As a result, the RTX strategy shows superior performance.

Is that understanding correct? Please correct me if I am wrong, much appreciated.

The attribute program will be used inside the anyhit program as well.
Calculations of attributes which are not sourced will be removed as dead code.
This is the same for RTX and pre-Turing GPUs.

“To my understanding, in “mega-kernel execution strategy”, attribute evaluation takes place for every successfully intersected triangle”

Only if OptiX is not able to optimize that automatically, which it did in the past, but not for all cases. That’s why I said it might not always do that
It always will now with the attribute program in the RTX execution strategy, which is the default now.

Having an anyhit program will generally run slower than not having one. This is especially true for RTX boards where the traversal and triangle intersection runs in hardware on the RT cores but anyhit programs will run on the streaming multiprocessor.
You’ll need the attributes latest in the closesthit program.

There are even more possibilities now. For scenes with only opaque materials (means no cutout opacity) there isn’t actually an anyhit program needed for either the radiance or the shadow ray in OptiX 6 anymore, because that introduced ray flags which are hardware accelerated on RTX GPUs and result in less execution divergence overall. The ray flags allow, for example, to terminate the ray on first hit, which is exactly what an anyhit program for the shadow ray of opaque materials does.
This is just one example how things can be sped up in OptiX 6 for special cases.

Please have a look a the GTC Presentation S9768 - New Features in OptiX 6.0 once it becomes available (30 days after the GTC 2019) on:
It shows the execution flow for these things.