Ray tracing performance in OptiX vs DirectX

Dear OptiX team,

an interesting paper about a new renderer (http://luisa-render.com) was released recently that benchmarks many different systems and backends.

We are currently looking into performance differences to our own renderer, which looks to be partly related to how polymorphism is compiled in the two systems.

Apart from that, there was another surprising observation in the paper: Figure 6 on page 16 shows a comparison of the same rendering algorithms being compiled on DirectX vs OptiX. There is a consistent near-2x performance difference across scenes.

I had always assumed that the OptiX/Vulkan/DirectX are deep down just different frontends around a central library related to raytracing (libnvidia-rtcore.so on Linux). In the thread “What are the advantages and differences between Optix 7 and Vulkan Raytracing API?”, @droettger makes some points that seem to reinforce that point.

In any case, I was surprised to see that there is such a difference and wanted to bring it up. The project provides an open source implementation that could likely be used to reproduce those benchmarks.


Hi @wenzel.jakob!

DirectX and OptiX do have some overlap in the core driver implementation, that’s correct. What that could imply is that there is more work happening in Luisa’s “CUDA” backend implementation compared to their DirectX backend. I would be very hard to speculate without a serious analysis of both backends, and some profiling to understand if Luisa’s CUDA/OptiX backend is going as fast as possible.

The author’s explanation for the performance gap is “probably because DirectX is a lower level API featuring inline ray tracing, thus having less overhead.” That might be true, and it could be the complete explanation, or perhaps there is more shader code generated in the OptiX case for some reason. If that is approximately the entire reason, then it will be interesting to see how this changes with the upcoming SER API.

A very brief glance at the Luisa code gives me the impression that the authors are doing the right things and have taken some care, but a complete understand of the performance differences might take longer. It would be useful to verify whether the isolated traversal performance is identical as expected, and if so what the traversal/shading split is and thus what the actual shading-only slowdown is in each case. Since there’s some variance across tests, that might suggest the slowdown could be related to a feature that is used in varying amounts. If the perf gap is primarily shading, then there might be relatively easy ways to isolate whether this perf gap mostly belongs to calling convention differences, or register allocation, or just pure shading code, or something else.