Supermicro X9DRG-QF ( SOLVED )

Anyone had a chance to try Supermicro’s quad GPU X9DRG-QF? Looks like it can support a 5th double width card for GUI duties. Any BIOS or compatibility issues?

http://www.supermicro.com/products/motherboard/Xeon/C600/X9DRG-QF.cfm

Update: I got this board for a second system to check portability of OpenCL port of CUDA solvers. With original 1.x BIOS, this was running so well (with 4 x Tahiti HD 7970) I decided to make it my main CUDA rig with 4 GTX Titans. Supermicro support said GK110 would need latest v3.0 BIOS, so I got the system up to date, bought 4 Titans, and held my breath: added first Titan, loaded latest WHQL driver and updated CUDA to v5.5. As hoped, everything just worked. Warm fuzzy feelings. Added 2nd Titan: perfect. All tests completed, scores off the chart, more warm fuzzies.

Then the fun began. These first two Titans were in PCIE slots attached to first CPU (CPU1 Slot2 and CPU1 Slot4). The other two slots are attached to 2nd CPU (CPU2 Slot6 and CPU2 Slot8). When I installed the 3rd Titan in CPU2 Slot6, any CUDA calls to this 3rd card caused instant BSOD. Same with 4th card in CPU2 Slot8.

Funny thing was, the other two cards (attached to CPU1) still ran any CUDA code fine.

Supermicro had just released BIOS R3.0a, and tech support thought this would help. After installing this, with 3 or 4 cards installed, I got no BSODs, but sadly no CUDA calls worked. When I take out the cards attached to CPU2, all is good.

Using Win7 x64, NV driver 332.21, 128 GB DDR3.

PS: with all 4 Titans installed, deviceQuery “PASSES”, and reports for each card:

Device 0:3 : “GeForce GTX TITAN”
CUDA Driver Version / Runtime Version 6.0 / 5.5
CUDA Capability Major/Minor version number: 3.5

Only odd thing was PCI Bus ID / location ID:

[Device 0:] Device PCI Bus ID / PCI location ID: 132 / 0
[Device 1:] Device PCI Bus ID / PCI location ID: 3 / 0
[Device 2:] Device PCI Bus ID / PCI location ID: 2 / 0
[Device 3:] Device PCI Bus ID / PCI location ID: 131 / 0

Do those “Device PCI Bus ID’s” for device 0 and 3 look right? I think device {1,2} were on CPU1, and device {0,3} were on CPU2.

This system 7047GR-TRF, board: X9DRG-QF is certified for quad Tesla K20, K40 (GK110), so I’m wondering if the UEFI BIOS in the GTX Titan may be causing some issue? It’s working so well with two Titans I’ve not bothered sorting this out, but time has come to engage 4 cards. I’ll have time later this week to follow this up.

Any ideas?

Ohmygawd, a Maxwell 750 Ti fits into that 5th double width slot of the X9DRG-QF

Good news! After reading about this Win7 Registry setting:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Class{4D36E968-E325-11CE-BFC1-08002BE10318}

over in this thread:

http://devtalk.nvidia.com/default/topic/738070/cuda-setup-and-installation/8x-gpu-gtx-issue-under-windows/

I found my entire history of 11 GPUs were all still listed as “existing devices”!? Simple fix was to back up the Registry, uninstall Nvidia driver, then (in VGA mode) delete all the “devices” in that Registry section, i.e.:

/0000/…, /0001/…, /0002/…, /0003/…, … /0009/…, /0010/…).

Before re-installing driver or adding hardware, I first had to boot to Safe mode and use Device Manager to “Update driver”, creating default VGA entries (in the Registry section I just deleted) for each of the two Titans already in the box.

Next, I installed the current Nvidia driver and CUDA 6 Toolkit to check all was still ok, then plugged in the extra two Titans and re-installed the Nvidia driver. Pleased to report… everything just worked.

Current job has 4 MPI procs, each using 5.2 GB of their Titan’s 6 GB device mem. Adding an extra 60 GB host memory for hi.res isosurfing makes a good first test for this near silent desktop PC.

Nice work, Supermicro + Nvidia.

PS: of course, the old “Reinstall Windows” trick would have fixed this too.

Thanks for the update. Windows really tends to leave stray anything plug and play lodged in the registry… everything from USB devices, to graphics cards, etc.

I’m curious that you say near silent, but a review on newegg of that tower/motherboard/dual redundant power supply combo mentions:

“Fans are a bit loud, even at idle. Heck, even when the system is powered OFF, the PSU fans make significant noise. If you ever get the system fans to spin up to 100%, the whole thing makes a high-pitched whine not unlike a jet engine. Overall, the constant hum doesn’t bother me, and is offset by the fact that the fans do move a lot of air.”

Or do you just have the board in some other case?

Same case (7047GR-TRF). To make it quiet, set BIOS fan option to “Balanced”.

With this option, fans only spin up when the Xeons get hot. Since my Xeons don’t get hot, fans don’t spin up. All I can hear at the moment is the soothing woosh of 4 Titan fans at 80%. One trick: I have the box lying horizontal with an 18 cm fan (Silverstone AP182) moving ambient air onto the quadruplets, plus a gap above the warm rear ends of the Titans as passive outlet. The AP182 is silent.

PS: when 7047GR’s are used as nodes in a cluster, the full-Monty fan mode is just right. Also, there’s underlying firmware that allows just two of the four mid-case fans to spin up when the GPU region gets hot.

Not sure I understand this: did you install an extra fan inside the case? Why does it matter if the case is horizontal?

By the way, I wonder if that motherboard would fit into any other cases? Any ideas? There are 10-slot cases out there, but I don’t know if everything would line up.

I replaced the solid steel side door of the 7047GR-TRF chassis with a light panel to make it easy to fiddle with heat flow. That 18 cm fan sits outside (on top of) the horizontal case, pulling ambient air down onto the four Titans. Also, since the outflow end of each Titan gets warm, I have a gap for that heat to flow up and out.

I originally wanted to put the X9DRG board into a MountainMods U2-UFO, but whereas a board like the ASUS Z9PE-D8 WS is only 12" x 13" (30.5 cm x 33 cm), this one is a SM-special: 15.2" x 13.2" (38.6cm x 33.5cm).

nnunn,

I see. What does it sound like when all GPUs are under full load?

Just got this board but will install in a Rosewill Blackhawk Ultra HPTX case. Will inform if it went well.