Mosaic drivers crashing on restart requiring reinstall of drivers

Hi everyone,

I have 2 Dell Precision 7865 Towers with AMD Ryzen threadripper pro 5945X, 32GB RAM with (2) NVidia A4000 graphics cards installed for 8 total DisplayPort outputs on each PC. We are using 6 of the outputs (4 from one card and 2 from the other) to go to 6 screens. Created 6 screen mosaic array because our presentation runs 1920*6480 (6 screens in portrait mode as one display). Initially we didn’t have NVIDIA Quadro Sync II cards installed but installed them when we started seeing the issues below - but they’re presence doesn’t seem to have any effect on our system.

We can get the mosaic set up and playing our presentation, but upon restart we have issues. The PC takes about 4-5 minutes to post, never loads into windows or launches presentation. We have to power off the machine, pull the graphics cards out, go to onboard graphics with one screen. Trying to launch the nvidia control panel doesn’t work and we have to uninstall/reinstall the drivers, shutdown, reinstall graphics cards, reboot, rebuild mosaic. All of that to have the presentation work fine until the PC needs to reboot/shutdown and everything happens all over.

Our 6 output DP go into a Wolfpackgreen 4k 16x16 HDMI Matrix switcher. From the switcher they go to Extron HDMI over Cat 6 extenders to the displays. Issue with graphics drivers crashing occurs on site with the Matrix switcher and extenders, but also in our test environment running directly to 6 displays. The same issue happens across two PCs. Mosaic appears stable on our onsite display until restart/reboot.

Any help would be appreciated, and I can clarify/add information not present if needed.

Thank you.

Hi, thats a bizzar one :-(.
Quick guess is: try with more symmetrie: connect 3 screens to each GPU, use exactly the same connectors on both GPUs.
Keep the AMD integrated GPU disabled, UNINSTALL its driver, we use to see issues with a mix or leftover of AMD GPUs and complex NV configs…
Do you use exactly the same adapters DP2HDMI for all 6 outs? What EDID do you see provided to our GPUs, I’d assume its not the real EDID from the final screen, but one provided (a ‘generic’ EDID) from one of the devices in between… ?
If this is still an issue, I have a few more ideas, many more detailed questions, ping here, and we should get in direct contact, pls…
thanks
-Frank

1 Like

Frank,

I would love to hear any of your other ideas. I will be on site with this PC tomorrow 5/2/24 and would love to try some new approaches. I am particularly interested in being able to turn on SLI in the control panel and having it recognize all 6 displays, instead of it going down to a single display.
Thank you,
-Ian

Hi Ian, sorry I dropped the ball!
but in a way I’m glad I did, because now you revealed, you try to enable SLi, which you SHOULD NOT! SLi will give you a single screen, trying to use the 2nd GPU to accelerate the rendering on that single display! You will want SLI DISABLED, then create a Mosaic, and if there is a Sync card (or where possible an old SLI, or a NVlink bridge) this will be used automagically to add sync between the GPUs for the Mosaic.
Uninstall any AMD drivers, maybe even try to disable the AMD gfx device, so drivers don’t automagically reinstall from Windows Update…
On the EDIDs, we’d prefer to have the real EDIDs from the real screens, rather than what typically is a very generic EDID from any device in-between. These typically blow up the list of capable resolutions, which makes Mosaic and MSFT stumble… So you might want to try to directly connect one of the real screens, use topology viewer, manage EDID, to SAVE that EDID to a file. Then use that one file to be loaded on all 6 outputs that are connected to the screen/splitter…
You should stay on a driver version, tested on the professional GPUs and applications, one from this list: NVIDIA RTX Driver Branch History for Windows | NVIDIA
I’d recommend trying the latest of the R535 and/or R550 drivers, so 538.49 or 552.22.
You can save the info from topology viewer to a file, pls provide that. Preferably from the failing, as well as from the primary, working system?
Thanks
-Frank