OpenXR abstraction vs direct driver integration for VRSS2, other optimization inquiry

Hey there!
Lots of discussion happening lately due to the renewed fever over the vr revolution and I was recently reminded of VRSS2’s support for the HP reverb g2 omnicept and potentially other tobii based eye tracking devices.

OpenXR provides an abstraction for gaze, and eye tracking called * /user/eyes_ext and XR_FOVEATION_DYNAMIC_LEVEL_ENABLED_FB

I know that VRSS2 is driver level, but I wonder if there’s some benefit to nvidia’s driver listening to the openXR runtime to be more device agnostic and provide things like NIS for OpenVR and OpenXR without needing to inject in the compositor step.

I’m not sure about the latency chain all the way up, but right now there seems to be a back and fourth fight between gpu render latency and cpu render latency in applications with the potential for exceptionally high draw calls via huge numbers of shaders, like vrchat.

Beyond that, I would love for nvidia to take another look at some of the optimization that is possible to reduce the time to render a frame in OpenVR/OpenXR, as right now it’s not so much about visual fidelity, but finding a way to provide performance improvements to dx11 apps in social VR.

In vrchat for example, with a RTX 3090 there are VERY common situations where even a cpu like the 5800x with 32gb of ddr4 3600 CL16 will be unable to break 20fps at steam’s reccomended settings, and even after adjusting all AA levels and rendering, there is severe bottlenecking occurring somewhere. I wonder if that could be helped.

An engine optimization targeting that situation (huge numbers of shaders and materials) would be welcome, as it is the most popular and most important VR application in 2022.

Hello DyLhun,

variable rate shading works best, when the application controls it. This is because VRS does work better for some render passes and effects and might not work well with others. It is content dependent so the domain knowledge of the application is very helpful. For example: in VR would want VRS for the environment, but not for text on menus (VRSS is the special case where you increase the quality so it would be beneficial to especially apply it to text).
Where (in screen space) you can make use of how much shading reduction via VRS is something the XR runtime knows best, as this depends on the lenses used etc.
So while in some cases we can make use of VRS automatically, the future will be about better integration into applications.

I’m unsure what you mean by “nvidia’s driver listening to the openXR runtime”. Anyone can listen in on the communication between the runtime and the application by providing an implicit OpenXR layer. Are you referring to this to get the gaze directions as they are queried by the application?

" I would love for nvidia to take another look at some of the optimization that is possible to reduce the time to render a frame in OpenVR/OpenXR" This is 100% dependent on the application. Rendering to a 2D display or rendering for the OpenVR or OpenXR API does make no difference. There is nothing OpenXR specific why applications might be too slow: it’s just that a high framerate at high resolutions are required for VR.

NVIDIA NSight Systems and NSight Graphics can be used to find the bottlenecks in such applications. The first questions would be if the GPU or the CPU is the bottleneck. Especially with many draw calls the one CPU thread that has to generate the command buffers for the rendering can be the bottleneck. In your example the 20FPS on a 3090 sound like a CPU limitation. To test this hypothesis I’d suggest to lower the resolution and see if the framerate stays unreasonably low. If so, it hints at a CPU bottleneck.

Thank you for the reply,

Oh it’s most certainly a CPU bottleneck in most cases: In unity’s 2019.4.31f1 built-in renderer, dx11, I wish some more threading on material setups before being sent to the gpu or offloaded. I guess it’s conventionally understood by most that dx11 is somewhat limited in the render threads it can run, but perhaps I don’t understand it deeply enough.

You mentioned:

Are you referring to this to get the gaze directions as they are queried by the application?

Yes, and I actually didn’t know it’s treated the same as 2d in the driver: that’s great info.
Is that also the case for for fixed foveated rendering and VRSS? Are there cases where it would stop working/give up?

I’ll spend a little more time in NSight for sure. That being said, you mentioned:

To test this hypothesis I’d suggest to lower the resolution and see if the framerate stays unreasonably low. If so, it hints at a CPU bottleneck.

That’s absolutely true, but here’s where it gets unusual. Even lowering the resolution to 1600x1900 @144fps, and CPU limiting the render rate to reduce cpu load still yields situations where if there’s 60-80 players in an environment even with a 12900kf, you will get around 45-60fps if you’re very lucky with a steep overclock on windows 11. This is with motion smoothing and reprojection turned off.

However, (and I suppose this forms the meat of my inquiry) quite strangely, if I open SteamVR Dashboard, the system immediately jumps to full frames: in a very unusual, almost “buggy” way that doesn’t happen when manually setting the resolution even excessively low, to something like like 800px before launching, which would usually “mimic” the “half resolution” step the dashboard does to the running, not paused application in the background.

This does not seem to be related to a specific headset intermediary runtime either, as the same thing occurs when using steamvr or oculus’s OVR runtime and dropping to dash.

On the other hand, on a 144hz 1440p+ monitor in desktop mode on the same hardware, the game will run unconstrained at or above 144fps no problem in the same situation: I know this because I’ve pulled my cable while playing, and the game launched again to the same situation on 2d.

Unfortunately, few to no applications are like or as advanced as VRChat is with Unity. This is further extended with everyday users creating new and custom shaders every day, custom render textures, some using grab passes, and interacting with each other on the fly with environment dynamics and post processing so advanced they previously only existed in siggraph papers.

It’s incredibly hard to simulate this situation outside of the live environment, and moving to something like HDRP or URP as far as I know isn’t an option for them due to the legacy content involved.

It’s a very weird and specific ask, I know, and I deeply appreciate the insight thusfar.

For context, I work as a VR content and marketing developer that often partners with VRChat for our events, and we use assets from multiple studios that they’re bringing to us to promote, like for example the game models from Neir and Guilty Gear, or the high resolution assets from Netflix’s Ghost in the Shell:SAC.

We would love to squeeze a few more frames in VR, and after 2 years of A-B testing our own builds and software in and outside of VRChat, I personally think this seems to be a “VR hardware specific miss” that we just can’t figure out. Basically we run into the same issues with many materials as they do. I work in the optimization side too, and we do some over the top things like getting under 50 or less set pass calls and use a single texture, and in unity editor in run mode hitting over 4000 frames, but in the final app, in VR, once you get over 50 materials of any type in read/write it gets a little weird and chunky.