Currently, I have a host machine that uses four RTX A4000 professional graphics cards to test Nvidia’s mosaic mode. The four cards are connected to a total of 16 4K displays of the same model. Using mosaic mode, we were able to successfully integrate the 16 displays into one large screen.
However, in this scenario, when using mixed CUDA and Direct3D 11 programming, mosaic mode can only use the primary graphics card (the first card) for rendering. It is not possible to use the second card or any of the remaining cards for rendering, and they only provide video output functionality without increasing the computing power or graphics memory. Essentially, one graphics card is supporting the rendering of 16 screens, and decoding an 8K video is smooth, as is decoding two 8K videos. However, there is noticeable lag when decoding three or four 8K videos, which is far from the capabilities of four professional graphics cards.
We would like to use the four graphics cards to fully decode a 16K video. What should we do to solve this problem?
This sounds like a very challenging problem. I seem to remember some limitation on the side of these RTX workstations GPUs, but I am not sure. I will reach out to the team working on MOSAIC and see if they can give some advice.
Meanwhile I hope you don’t mind if I move this into the dedicated MOSAIC category as that is the place where people will track questions.
In simple answer it is possible to use all the GPUs in Mosaic using CUDA however for DX11 its not possible.
The reason is that by design Mosaic mode creates what’s called a logical GPU which is exposed to the operating system and applications. When an un-modified application runs it executes on the logical GPU and the driver sends work to all the GPUs in the Mosaic mode with displays attached. For CUDA → DX11 interop it depends on the exact set of commands whether the driver will send the CUDA work to all GPUs or execute CUDA on a single GPU and then copy any data used by DX11 to all GPUs.
CUDA is what we call an “explicit” API which means that even though there is a logical GPU in Mosaic mode, all the physical GPUs are exposed to the CUDA API so applications explicitly written to take advantage of all CUDA devices in a system should get access to all GPUs in the system. DX11 however is not explicit and doesn’t expose all the physical GPU devices, so it only sees the single logical GPU.
DirectX 12, Vulkan and OpenGL Multicast extension are also explicit Graphics APIs and will expose all the GPUs in system so can definitely be used when in Mosaic mode. However the application needs to be explicitly written to use all GPUs.
When in Mosaic mode the GPUs are in what’s called Linked Display Adapater (LDA) mode which is referenced in the Microsoft documentation.
Hope this helps clarify.
Thanks a lot! I’m looking forward to the feedback.
Thank you for the clarification. I was wondering if libraries like nvdecoder and nvjpeg are explicit API as well? Do they need to be rewritten in order to work with DX12?
Yes by virtue of using the CUDA API those libraries are explicit. However if you mix an explicit API (eg CUDA) and an implicit API (like DX11) there could be inefficiencies because the driver is trying to support competing requests - target specific GPUs for the explicit APIs and make it look like one big GPU for the implicit APIs.
Also with explicit GFX APIs you need to know the physical display region to GPU mapping so you can ensure each draws the correct picture.
In order to improve efficiency, each of my graphics cards only used to decode the video that needs to be displayed on the screen which is plugged into, for avoiding duplicate copying between graphics cards. However, in mosaic mode, if the entire screen is treated as a single unit, will it be necessary to copy all the decoded data to the logical card in order to display it properly, even if I use DX12 to decode the data on each hardware graphics card? Or, if I use an explicit API, each screen can generate its own window independently of the mosaic mode?
If you use DX12 in Mosaic mode even though there is one big (borderless) window covering all displays you can use viewports so that an individual display driven by a GPU only needs to updates the viewport corresponding to it. In this case the decoded content for that display only needs to exist in the GPU with the display and you don’t need to copy the decoded content anywhere else. You can use copyTextureRegion method which uses the GPU copy engine allows you to copy into the back buffer region corresponding to the viewport.
Mosaic mode has the advantage that the GPUs are in LDA mode which means Peer 2 Peer (P2P) copies are available between GPUs so any GPU can read and write into any other GPUs memory. So you can also have GPUs with no display attached simply decoding and then the resulting decoded textures can by copied using the GPUs copy engine to any other GPUs updating the appropriate viewport, if rendering was taking too many resources from decode for example. The “Maximum performance” option when setting Mosaic mode will include all GPUs with or without displays in the DLA group.
The nice advantage of Mosaic mode versus individual displays is that since there is a single window across all displays, there is also a single present call to execute the back to front update across all displays and this executes using HW flip locking, meaning that all the GPUs present methods are synchronized in HW.
I have a question. Let’s say I took 3 graphics cards and connected 9 screens to make a 10k mosaic screen. When I insert a separate low resolution monitor on the fourth graphics card, will two screens (one 10k, one 4k or 2k) appear in windows display Settings? Can I set them to copy mode, and if so, can I switch freely between copy mode and extension mode given by windows?
Another problem, mosaic mode currently in Windows 10, is said to support only 16 screen Mosaic, while Windows 7 can support 32 screen Mosaic. But now support win7 hardware devices less and less. What if you want to splice 32 screens on Windows 10?