OptiX Prime performance issue using Windows

Dear OptiX development team,

I have started to evaluate OptiX Prime for one of our future projects. Since our clients require both, windows and linux support, it is very important to us to achieve comparable results on both platforms. Unfortunately, I am not able to obtain similar performance results when comparing the two operating systems.

Here some details on my OptiX Prime test case:

I am shooting 4.500.000 rays towards one of our meshes. I have set the origin and direction of all rays such that I have 100% intersections with the mesh.
For the evaluation, I compare the computation time for both, linux and windows platform.

The linux shooting is approximately 40% faster compared to windows.

What I found out so far:

  • Since our project will be based on instancing, we are using the instancing way with identity as transformation matrix for the test. When I switch my test case from instancing to a simple model, performance results are absolutely identical over different platforms.
  • Furthermore, when profiling with nvvp, it turns out that the instancing shooting, trace2() call, is where the 35% get lost, the non-instancing trace() call delivers similar performance for both, windows and linux.

Available cards are: GTX980 Ti, GTX970, GTX950, Tesla K40
OptiX versions tested: 3.9.1, 4.0.0
Driver version windows: 368.81
Driver version linux: 361.28

Looking forward to some hints to achieve similar performance in windows, when using the instancing capabilities of OptiX Prime.

Of course, I am happy to share more details on my test setup, if necessary.

Best,
Nico

Hello Nico,

A few clarifications:

  • Is this on Win10? If so, could you try the RS1 update and up-to-date driver (eg, Windows 10 Anniversary Update 64-bit, 372.54 ) and see if that helps?
  • So you see overall similar perf linux v. win when not using instancing?
  • Could you provide us with an API trace so we can test your results locally? Information below on capturing an api trace and sending to us. You can send the trace to optix-help@nvidia.com

Thanks,
Keith

===============================================================================================
An OptiX API Capture (OAC) trace contains all OptiX and OptiX Prime API calls made by the application, across the whole run of the application, together with their input data and return values. Traces are most often used for sending bug reproducers to Nvidia, but can also be used to make application benchmarks and for developers to analyze and debug their applications.

To create a trace, set the environment variable OPTIX_API_CAPTURE to 1 and run the application. A directory called oac00000 will be created within the current working directory. If that directory already exists, the number will be incremented to avoid overwriting an existing trace.

Zip the trace directory and send it to Nvidia. We can only receive 10MB files by email, and the extension must be renamed to not be .zip. For larger files, use a file sharing service. We recommend contacting us and we will set up a temporary upload location.

When sending a trace to Nvidia, please put some effort into keeping it small. If the trace is for debugging, use the smallest effective dataset and shortest number of frames necessary to reproduce the bug. If the trace is to become an OptiX benchmark, the dataset will be larger, but be very precise in the number of frames recorded.

The OAC trace directory contains a trace.oac file, which is a text file containing information about your platform, followed by all OptiX API calls, with their parameters and return values. You can, for example, search for occurrences of “rtContextLaunch” in this file to see all OptiX calls that are made per frame to make sure you’re not doing redundant setup work. The “oac.buf." files in the trace directory contain binary buffer contents, both input and output. Each time a buffer is mapped its contents are stored to a new file. The "oac.ptx.” files contain PTX text of application-provided program code.

Since OAC traces contain contents of all OptiX buffers, they contain the PTX form of the application’s shaders and other user program code, as well as the geometry and texture maps given to OptiX. While Nvidia will keep these secure and nev
er share them, you will want to make sure that sharing a trace with Nvidia meets your institution’s policies.

Hey Keith,

Thanks for your fast reply. I just sent the OAC files via mail.
To answer your questions:

  • We are using Windows 7. I have updated the drivers to latest 372.54 without any changes.
  • Exactly, when not using instancing, everything is fine!
  • Done.

Looking forward to hear from you and your team!

Best
Nico

Thanks Nico. We are working to reproduce your slowdown now and will get back to you.

keith

Quick summary:
We reproduced this with Nico’s help. It will be fixed in an upcoming release. Symptoms are exactly as described in the original post: slowness with Prime on Windows when instancing is enabled. Other configurations are not affected.

Nice catch, thanks to Nico for reporting this.

Confirmed fixed in 4.0.1 and listed in the release notes.