Quadro Sync II clarification on NvAPI Swap Group, Swap Barrier, and Present Barrier in DirectX 11 vs 12

Hi everyone,

I’m currently working on a multi-machine synchronization setup using NVIDIA Sync cards, and I need some clarification regarding the differences and capabilities of Swap Groups, Swap Barriers, and Present Barriers in DirectX 11 and DirectX 12.

Here’s what I understand so far:

Swap Group & Swap Barrier (DX11):

  • In DirectX 11, it’s possible to join a swap group, and then bind a swap barrier.
  • This seems to ensure that the swap chains present frames simultaneously, but it doesn’t seem to block machines if one takes significantly longer to render a frame.
  • Question: Is this behavior correct, is using the NvAPI_DX1x_Present any different than using the normal Present of DX11? So, do Swap Barriers in DX11 synchronize the display refresh but don’t enforce frame-by-frame blocking? Even if the “Blocking” parameter is set to true? Or is 0.5 sec too long? Will it only block for a few frames and then ignore the hanging machine?

Present Barrier (DX12):

  • The Present Barrier is only available in DirectX 12, and from what I’ve gathered, it might ensure that all machines block until every participant submits its frame. This would guarantee perfect frame lock even if one machine takes longer to render.
  • Question: Is it correct that Present Barriers enforce stricter synchronization compared to Swap Barriers in DX11? What is the difference between the Swap Barrier and Present Barrier and how should they be used?

Expected Behavior During Frame Freezes:

  • In some tests, when simulating a 0.5-second freeze on a client or server in DX11, only the machine with the freeze was affected, while others continued to present frames.
  • The frame counter seemed to be hardware-based and continued updating on all machines.
  • Question: Does this mean that DX11 Swap Barriers don’t enforce blocking synchronization even if the “Blocking” parameter is set to true? Or is there any timeout when using swap barriers and/or present barriers?

Hi,

  • Question: Is this behavior correct, is using the NvAPI_DX1x_Present any different than using the normal Present of DX11? So, do Swap Barriers in DX11 synchronize the display refresh but don’t enforce frame-by-frame blocking? Even if the “Blocking” parameter is set to true? Or is 0.5 sec too long? Will it only block for a few frames and then ignore the hanging machine?

That sounds like the barrier isn’t working since when in the barrier it should block the present of all apps joined to the barrier until the last finishes rendering where upon they should all present together.

This does assume the Quadro Sync II cards have all be configured correctly and are indicating they are all in sync via the Topology Viewer or LEDs on the board itself. Its worth checking that.

If the Quadro Sync II cards are configured correctly, then a couple of other factors may cause the behavior you indicate. The first is that swap group and barrier only work with full screen applications (no task bar on top) so if the app is not full screen then what you describe would occur. If the app is full screen then sometimes something can prevent it from entering the barrier, like the task bar being on top or desktop window manager being active. To see if this is the cause use the configureDriver.exe utility available for download from our website and it contains an option (#7) to turn on an indicator showing whether desktop composition is operating. With this option turned on, if the swap group and barrier are pending then you would see the behavior you are observing.

Finally, there is also another configureDriver.exe option (#11) that turns on what we call pre-present wait, which means that even in a swap group and barrier the present call will return immediately and under the hood the driver will ensure all the presents are aligned. The reason for this option is to allow the application to move to the next frame since otherwise blocking after the present will result in a significant performance penalty. In this case if the application presents another frame before the previous frames have scanned out then it will block moving to the next frame until the group of first frames have scanned out.

Present barrier is an updated API replacing the swap groups and barriers and available for both Vulkan and DX12. Essentially the difference is the distinction between group (ie between GPUs within a system) and barrier (between GPUs between systems) was somewhat arbitrary from the application perspective. It doesn’t really make a difference. Additionally, we took the opportunity to support the non-blocking behavior for performance as the default behavior.

Under the hood they both rely on the same hardware sync mechanism and effectively do the same thing, its just the present barrier is more streamlined and flexible.

  • Question: Does this mean that DX11 Swap Barriers don’t enforce blocking synchronization even if the “Blocking” parameter is set to true? Or is there any timeout when using swap barriers and/or present barriers?

The frame counter is indeed a hardware counter exposed by the Quadro Sync II board (it counts frames by itself). The fact that some systems continued would seem to indicate that the swap/present barrier mechanism is engaged, which as I mention could be because Quadro Sync II cards themselves not being synchronized, or the barrier is not engaged due to desktop window manager or being a windowed application.

thanks,
Ian (NVIDIA)