Performance regression in v510 driver

Dear OptiX team,

we’re observing performance regressions after upgrading to the latest v510 driver. For example, rendering time of a test scene that we use for benchmarking goes from 26.6s to 33.5s (a 25% slowdown) This is with unchanged host/PTX code and a megakernel (i.e. one big optixLaunch).

Is this to be expected? This is in an unreleased codebase, but I am happy to provide access in case somebody on the NVIDIA side would like to investigate.

Thanks,
Wenzel

PS We were comparing 465.19.01 and 510.39.01 on Ubuntu 20.04.3 LTS (those are the official driver versions shipped via the nvidia-driver-465 and nvidia-driver-510 APT packages).

That doesn’t sound good. We will be investigating, and may take you up on the offer of a reproducer. In the mean time, could you outline the major features you’re using? This is with the atomics we were discussing? Are you making heavy use of anyhit programs? Custom intersections, or only triangles? Complex materials? Also, which version of the Optix SDK?


David.

This is a standard path tracer with MIS written as a raygen program. It does two ray tracing calls per loop iteration (shadow ray for direct illumination and a BSDF sample for the next iteration). The closest hit program returns five payload values values characterizing the intersection (UVs, shape & primitive ID, and ray distance).

There are no anyhit programs (they are disabled via a specified flag as well). Only triangle meshes are present in this scene.

All shading is done using a sequence of direct callables representing different BRDFs. Direct callables are also used to compute more detailed intersection information for each shape separately (normal interpolation, etc.)

The program does a bunch of atomic scatters at the end to update the film with a reconstruction filter (they only make up a tiny part of the kernel runtime)

In case it’s relevant: all memory referenced by the scene is device memory (i.e. not managed etc.), but it’s allocated via the new stream-ordered asynchronous allocation functions.

Hi @wenzel.jakob,

At long last I have some good news. This bug has been fixed and scheduled for release. You should see the performance on Mitsuba’s OptiX backend return to previous levels starting with the drivers numbered 515 and higher, which are coming soon.

For posterity and anyone concerned whether this affected them, this issue was related to the way that Direct Callable programs are compiled, so is something that would only affect people using callables heavily, and did not seem to affect most people. The regression occurred between the 465 and 470 drivers, a little under a year ago. Wenzel provided a reproducer privately that we have used to test and validate the fix.


David.

1 Like

Amazing news – many thanks to you and everyone else involved!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.