Quadro Sync II Does Quadro Sync II support video card synchronization across multiple computers?

Does Quadro Sync II support video card synchronization across multiple computers? If so, how should it be connected and is there a maximum number limit?

Hello and welcome to the NVIDIA developer forums @461722946!

My interpretation of the User Guide would be that you can have up to two seperate systems, as long as they share a chassis. Likely because of the physical connector properties and cable lengths.

Additionally, if it is just about synchronization, every Quadro Sync II has an external timing source connection, allowing more complex setups as well.

I asked the experts if they could share some more insights, so stay tuned.

Thanks!

Hi,
the QSII can sync outputs from one or up to 4x GPUs in the same chassis/OS. The scanout will then be synchronized, the SwapAPI enabled, so any (NVSwapAPI aware-)application can render to the individual synchronized screens, and if the individual threads of the app join the swapbarrier, they can be made to wait for the busiest thread, so a swap buffers will happen on all screens at the same time/refresh.
When using more than one GPU, the app should be multiGPU aware, like use GPUaffinity, so the rendering load can be distributed across all the GPUs…
If a single node is overloaded with the complete rendering task, so distributing the workload across multiple nodes is desired, first of all, the app must be cluster aware, and provide a distribution- and frame “re-sync”- mechanism for cluster. Then, each node can have a GPU and a sync card, an (app)master node will distribute fractions of the workload to each node, keep all nodes updated with scene-updates, and each node will render its fraction of the workload and canvas. The QSII will again sync scanout across all configured screens, and provide the SwapAPI, so now each node will need to join the swapgroup, so each node will wait with bufferflip for the busiest node. In this scenario, the sync cards need to be connected straight with CAT5+ cables, daisychain, the Sync master node(=one of the rendering nodes, NOT the app master node) can be placed in the middle of 2 sync chains (2x RJ45 syncOUT), with each client sync node having one RJ45 as IN, one as OUT. We’ve tested up to 25 nodes/QSII in one chain, so going to like 50 nodes will not be a problem, for the sync cards (still, cluster aware apps must alo be tested in such a large cluster setup…).
Hope this answers your question? what setup are you aming for, what is your target application?
regards
-Frank

1 Like

Thank you for your answer!
I had to render the same scene separately with multiple computers, and each computer used a camera to capture parts of the scene and stitch them together. These scenarios need to be precisely synchronized. The current problem is that when the same frame is refreshed, there will be a split between different screens. The feeling is similar to not having Vsync on, so I want to use the sync card to solve this problem.
In addition, I have a question, can Quadro Sync II be used with nvlink? What happens if I have four A5000 cards on my system, all connected to Quadro Sync II and using two nvlink Bridges? In this case I’m using mosaic mode, will a single application (3D engine) be more efficient than mosaic mode using Quadro Sync II alone?

Hi again… I’m not sure I fully understand your setup. So I struggle explicitly answering your needs. All I can do is to try and answer generic, and you then need to adopt to your specific setup…
Cluster aware apps manage from a single master node, to split and distribute the rendering and workload onto several nodes, they also keep all nodes updated with scene changes etc… They also make sure all nodes start rendering the ‘first’ frame exactly same time. From that point on, varying workload per node/GPU could introduce the frames getting out of sync, busy nodes staying behind less busy ones. That is where QSII and the use of our SwapAPI in the cluster aware app can ensure that all nodes/GPUs will always wait for EVERY node to signal: a new frame is ready. If just a single node is not yet acknowledging to be ready to present a new frame, all other nodes will still repeat the old frame. This way, no tearing between nodes/GPUs is guaranteed. [there will be app re-syncs for scene changes on top, but they are network sync’d, less timing sensitive…] This is the basics for a cluster to render in sync, not added camera feeds yet!
Can you test and confirm, this already works for you?
I don’t understand how you use cameras to ‘capture’ the scene and stitch together’? My understanding would be, the cluster aware app creates a scene, and camera feeds could be used to fill textures in the scene? Are you ‘just’ trying to display multiple camera feeds seamless across multiple screens? You then would need a video display app, cluster aware, AND capable to keep multiple camera feeds in sync…
The gfx card driver can only keep frames in sync, iF an app uses the SwapAPI… (better said, make the app aware of when all nodes are ready for swapbuffers, so theY swap buffers in sync…).
NVlink is a totally different technology, will not help here. Its a much faster data-highway between 2 GPUs. it just happens to be, that for a fast, low latency data transfer between 2 GPUs, syncing their clocks is beneficial, so using NVlink also syncs the clocks of max 2x GPUs. This can be used instead of a sync card for 2xGPU-Mosaic. ANY other sync setup will need the Sync card!
Re your last line: Mosaic allows for single app to easily display synchronized across multiple screens of multiple GPUs in a single node. For 2 NVlink-able GPUs, NVlink can be used (only) to syncronize the GPUs. If more than 2 GPUs (still in same node) then QSII needs to be used to sync all GPUs! Mosaic is limited to a node, cannot span multiple nodes… and Mosaic does NOT scale the perf to more GPUs. A 2x2 Mosaic from a single GPU is same speed as the same 2x2 Mosaic (and same workload and screenspace) from 2 or even 4 GPUs. To scale perf of GPUs, your app needs to be multi-GPU aware, be able to split the workload, and distribute to individual GPUs explicitly.
Raytracing being a massivly parallel compute task by its nature scales nicely across many GPUs, but rasterized rendering does NOT, nor does video(stream) display…
Maybe you can share more details of your project, to get more directed guidance?
regards