Calling NvApi_D3D1x_Present on A6000 is low performance

Hi all,

I’m developing a cluster rendering software and want to synchronize all outputs with so call swap group/swap barrier NVAPI. (NVAPI R275-NDA-developer)

In my scenario, two graphics workstations, with each an A6000 and a Quadro sync II are equipped, are connected with daisy chain mode. Through Nvidia Control Panel, all screens attached to the workstations are configured to synchronize properly. Everything seems to be good so far and render output on all screens synchronized nicely as expected in full-screen mode.

But I noticed that when calling NvApi_D3D1x_Present instead of IDXGID3DSwapChain::present (my program is written in DirectX11), fps DROPs almost to a half. For example, my program may run at 22~24 fps with IDXGID3DSwapChain::present at 4k, while only 11~12 fps with NvApi_D3D1x_Present. When it switches back to windowed mode, both would run at about 22~24fps, but out of sync. FPS would drop to a half even with the daisy chain not connected, just in full-screen mode.

I thought it might be a driver bug and upgraded the driver to the latest version 496.49(NFE), but without luck.

The program is tested on another workstation(equipped with a RTX A5000 but without Quadro sync II card),fps seemed to be nice (the same) both in windowed or full-screen mode, arount 17~18fps.

I googled around but found nothing about this.

Any Ideas? Thank you in advance!

Here is some extra information about the workstations.
OS: windows 10 20H2,
CPU: AMD Ryzen 7 5800x 8-core @ 3.8G
MotherBoard: Gigabyte x570 AORUS MASTER
RAM: G.SKILL128G DDR4 2666 (32G X 4)

Hi all, I’m still waiting for the solution. If anybody who knows something about the problem, would you please provide some hints? Any suggestion would be appreciated. Thank you in advance.

Hi, did you ever get this fixed?
I even get a crash when calling NvAPI_D3D1x_Present() instead of the original present function.
There might be some issues with Windows intervening with swapping behavior (see ‘PresentMon’); DWM intervenes with swapping depending on the visibility over overlays like the XBox Gamebar or the taskbar (in other words, you’re not completely fullscreen).

I managed to get a bit further; I called NvAPI_D3D1x_Present in the wrong place and triggered an infinite recursion. I’m integrating QuadroSync into Unity and have similar issues.
My current workflow:

  • add a native render plugin to Unity
  • hook DX11’s Present function (using Kiero)
  • when NvAPI has joined the swap group, replace the original Present function with NvAPI_D3D1x_Present calls. Before I just called the original Present and this fails to neatly synchronize the Quadro card outputs. The NvAPI wrapper seems to do some the synchronization, progressing towards a mutually equal clock.
  • since Unity doesn’t provide the swapchain to native plugins, it takes a frame for me to catch that. That’s why it was a bit of a hassle when to start calling the wrapper NvAPI_D3D1x_Present instead of the original one. Not calling the original one just stalls the entire thing. The present function is called two times now; one by Unity and one by me in the native plugin (NvAPI_D3D1x_Present, which calls Present). I can’t stop the one from Unity but I can detect that it’s not me and then just skip that one. Otherwise my framerate is halved (hm, that might be a clue actually!).

I can render at 180Hz in certain situations. When I go really fullscreen, it drops to 130-150Hz (so not 50% like in your case).

Some ideas:

  • There are new presentation modes in newer Windows versions, see https://www.reddit.com/r/pcgaming/comments/6ukc1z/tip_disable_full_screen_optimizations/ for example. DX11 didn’t know about these so Windows tries to inject these modes into your fullscreen app. The modes have varying latencies and throughputs, so the mode you’d like varies depending on the quality you need most in your rendering (low latency or high framerate).
  • Check out PresentMon which displays the mode for each graphical window (for fullscreen it may record a csv or something so you can check after the fact).
  • Try displaying the taskbar, or the GameBar. Does framerate increase? This is due to DWM. Windows 10 seems to always enable vsync in that case, but causes latencies. For me, displaying the taskbar brings up rendering to 180Hz again, even though now it’s compositing in the taskbar graphics.
  • I believe I read somewhere that Unreal recommends turning off ‘Fullscreen optimizations’ for unreal.exe
  • DX12 does have support for the new modes; if that’s a viable option, try that (but be sure you have control over the mode that Windows selects, to avoid giving up too soon)