Ray Tracing Performance in OptiX vs. Vulkan/DXR

Hi OptiX team,

I’m revisiting a long standing performance discrepancy/question that I have been encountering in my renderer, ChameleonRT: GitHub - Twinklebear/ChameleonRT: An example path tracer that runs on multiple ray tracing backends (Embree/DXR/OptiX/Vulkan/Metal/OSPRay) . ChameleonRT is a small path tracer with as identical as possible rendering backends written in each API (Vulkan, DXR, OptiX, Embree, Metal). What I observe is that, in the path tracer version of the code, OptiX is quite noticeably slower than the Vulkan and DXR. For example in an untextured path tracing rendering of Sponza with a single hard-coded light I observe the following performance (all measured using device-side timers):

  • OptiX: 8.037 ms/frame
  • Vulkan: 1.661 ms/frame
  • DXR: 2.733 ms/frame

Screenshots of each are shown below:




When I run these through NSight Compute/Graphics accordingly, I find that the OptiX backend has significantly more VRAM/Memory (and L1/L2) traffic and lower SM utilization compared to Vulkan/DXR.

OptiX Utilization

Vulkan Utilization

DXR Utilization

What I’ve also found is that if I write simpler renderers, e.g., primary rays w/ barycentrics shading, or a basic primary hit + AO ray renderer, I don’t see this performance difference. So it seems like there is some difference in how the larger path tracer kernel and BRDF code gets compiled on OptiX vs. Vulkan/DXR, or some other issue that is not clear to me. So I wanted to see if anything stood out here in the code as being wrong, I did also see Wenzel Jakob’s post mentioning that the Luisa renderer has seen similar issues: Ray tracing performance in OptiX vs DirectX in their paper.

The code for each backend is online:

And releases with all backends built for each platform (as supported by the platform) can be downloaded from the latest release: Release 0.0.10 · Twinklebear/ChameleonRT · GitHub . The barycentrics and AO versions of the renderers can also be downloaded: Release 0.0.8 OptiX + Vulkan Barycentrics & AO · Twinklebear/ChameleonRT · GitHub and code seen on the git tag: GitHub - Twinklebear/ChameleonRT at 0.0.8-optix-vk-bary-ao

Thanks for your time! I realize asking to dig into the code is a lot, but I’ve been stumped on this issue for some time, and hopefully the code is small enough for a brief look through. Or if there are some known issues and this is expected, that’d be good to know as well.

1 Like

Some colleagues have also mentioned that they didn’t see such a big difference on the 40-series GPUs, so that may also be the case here. Unfortunately I only have a 30-series to test on here.

Looking over the OptiX implementation please try removing the OPTIX_TRAVERSABLE_GRAPH_FLAG_ALLOW_SINGLE_GAS code path.

The DXR and Vulkan RT implementations do not support that and to match their behavior, please use the fully hardware accelerated code path with a two level acceleration structure via OPTIX_TRAVERSABLE_GRAPH_FLAG_ALLOW_SINGLE_LEVEL_INSTANCING in OptiX as well.

You might want to check that on non-RTX boards as well. If that turns out to be slower than the single GAS mode there, you can make that optional based on the presence of RT cores by querying optixDeviceContextGetProperty() with OPTIX_DEVICE_PROPERTY_RTCORE_VERSION.