Server Motherboards for mulit-GPU systems (&Fermi)


Anyone have any recent experience with building an extreme CUDA system using 4x 295s or similar? What are everyone’s general thoughts on the most robust system for these goals as of November 2009?

I read a previous thread which described building a custom case around the ASUS “P6T7 WS SuperComputer” motherboard.…&templete=2

I am however leaning towards a more pre-fab solution, such as those offered by SuperMicro such as the “SuperServer 7046GT-TRF”…-7046GT-TRF.cfm

SuperMicro also offers a newer motherboard that offers 7(!) PCI 2.0 x8 slots:…00/X8DTH-6F.cfm

It is unclear though, if a double-wide card still obscures the adjacent slot. From the pict, it seems the slots are spaced a littler wider apart than normal, but I am not sure if it is possible to get 7 double-wide cards in this mobo, or “only” 4. Anyone know? Of course powering 7 cards without burning your office down is probably another challenge in and of itself…

Also this mobo uses x8 slots, instead of x16. Does it really matter much for CUDA?

And finally, anyone have a crystal ball that can inform me how these systems/mobos will do with Fermi?

I suppose the bottom line question is: what is currently the best hardware platform to host the max number of cards possible, and should we expect this to change at all with the release of Fermi? Do these choices from SuperMicro effectively represent the bleeding edge of what is possible at the moment, and will they remain good choices for Fermi systems?

Thanks for any insights into these matters…

The SuperServer looks reasonable for 4 GPUs. (You can also find Antec and Lian-Li cases with enough rear slots to hold 4 GPUs if you want to use the P6T7 motherboard. I used the custom case because I didn’t want a huge tower just for a GPU server.)

There is no way the last SuperMicro motherboard (with the 7 PCI-E slots) can hold 7 double slot cards. Those slots are too close together. That board is almost certainly designed for servers which need a lot of network cards and/or disk controllers, which are single slot cards.

Generally speaking, the limitations on multi-GPU systems are physical volume, power, heat, and bandwidth:

  • Volume: It looks like for the near future, high end GPUs will continue to be double-slot cards, so most ATX motherboards in combination with a large case will max out at 4 GPUs.

  • Power: The highest power consumption GPUs are the GTX 295 cards which are rated to draw up to 289W. The highest power PSU you can find which is can run on a standard 110V 15-20A plug seems to be around 1350W. That pretty much also limits you to 4 GPUs unless you go to 240V supplies, or dual supplies that plug into different wall circuits.

  • Heat: Well, this is easy. Get good fans. :)

  • Bandwidth: The number of independent PCI-Express lanes in a single CPU system seems to be limited to about 36, which means you can only supply the maximum x16 bandwidth to two GPUs. When you install 3 or 4 GPUs, the motherboard has to downgrade some of the slots to x8. Whether this affects you depends entirely on the specific CUDA applications you want to run. Many CUDA programs load a bunch of data onto the card, then operate on it with minimal CPU-GPU communication for a long time. These sorts of programs are not affected by PCI-Express bandwidth and would be fine in a 4 GPU system. Other programs might not do so well.

If you go to dual socket systems, there are motherboards with two X58 chipsets (one for each socket) that can do four GPUs at x16 bandwidth in a NUMA-like configuration.

So, basically, 4 GPUs is the practical limit for a single computer using available parts unless you get really creative (and have lots of money). This is unlikely to change for Fermi, as I expect they will continue to be dual slot cards with PCI-Express 2.0 x16 interfaces drawing less than 300W per card (the limit if you draw 75W from the PCI-E slot, 75W from a 6-pin PCI-E connector, and 150W from an 8-pin PCI-E connector).

It is possible that the power draw could go as high as 375W if NVIDIA was willing to go to two 8-pin PCI-E power connectors + slot power. In that case, any current GPU system rated for 4 GTX 295 cards would only be able to drive 3 Fermi cards. I seriously doubt NVIDIA will try to push the power envelope past 300W, as this would make it harder for people to deploy 110V 1U Tesla systems with 4 GPUs like they have now.

Thanks for the detailed reply…

And one of these four cards must drive video too, correct? So really “3.5” cards for CUDA is max in a single computer, right?…4U/MNL-1185.pdf

Actually it seems in addition to supporting the 4 x16 cards, this system can also support 1 x4 PCIE2.0 card, and 1 x4 PCIE 1.0! (And it also looks like there is still an open x8 PCIE2.0 slot as well, but the manual does not state this can be used at the same time as the 4 x16 slots, so probably it can not be…)

So I guess the 4 x16 slots could be dedicated to processing, and the x4 slot could supply video?

(assuming there was enough power for all of this… there might not be…)

This thing seems to be a pretty insane system! Is anyone here using this already?

Well, that depends on your operating system. All of my CUDA systems run Linux, where you can turn off the GUI and get complete control over all of the CUDA devices. Even in the case where a card is having to run the GUI, as long as your kernels take less than 5 seconds each, you can still make pretty good use of all devices in a system. The card managing the GUI is only going to be less efficient if you are actually using the display actively while it is also running CUDA code.

That seems reasonable. You can certainly find a low power card to drop into the x4 slot.

If you are running Windows XP (not sure if later systems fix this), I believe you need to be sure that the x4 card is also an NVIDIA device (obviously of much lower power) that is recent enough to be supported by the current device driver. XP cannot load more than one video device driver at a time. (Windows is far outside my expertise area, so I’m going of memory of comments in the forums.)

Although, again, I wouldn’t worry too much about offloading the video unless you plan to use the system interactively while it is computing, or if you think your kernel calls will trigger the watchdog timer.

I am currently trying to build a system with 4 GTX 295 cards and a P6T7 WS motherboard. I installed Ubuntu 9.04 (64 bit) and the cuda driver 190.18.

I was able to plug the 4 cards in x16 slots (so they physically fit, unless you want to plug the frontal USB drives) but the system was not detecting them.
Then, I was able to have the system working correctly with only 2 295 cards (4 GPUs in total), but adding a 3rd card the system broke: the X windows are not loading and cuda is not able to detect any cuda-enabled device (however, the 3 cards appear in the lspci).

I read some posts talking about a possible 5 GPU limit in linux (3 295 have in total 6 GPUs). Does anybody know if this is true and if there is any way to fix this?

Thanks for the help!

no, the GPU limit is at least 8 GPUs.

It would be awesome, if someone with experience in building & supporting web-sites owned a wiki or smth. similar, where people could report on successful builds and problems pertinent to NVidia-based GPGPU hardware platforms and relevant OS tweaks. From the financial standpoint, that web site could be ad-supported (e.g. by motherboard and power supply vendors or online stores), or NVidia-supported (after all NVidia might actually be interested in making life of GPGPU integrators and computing enthusiasts easier).

To answer the inevitable question, unfortunately, I am not a web designer.

We did that once for Tesla PSC configs (which I had actually tested), but that’s all pretty out of date now–since I’m moved to software and actually fix CUDA bugs instead of just whining about them now, I have more pressing things to do. I also didn’t feel good about just making lists where I hadn’t tested the hardware, as there are all sorts of weird BIOS issues to worry about.

Wiki-based information does not necessarily have to be fully NVidia-tested and/or endorsed. Being community-tested is still very useful. This is how Linux hardware compatibility lists are often maintained, and I find those extremely useful: I won’t buy a non-trivial piece of hardware to be driven by Linux, which is not on such a list.

I have a P6T7 motherboard with 4 GTX 295 cards installed, and I see all 8 devices. (Or at least I did. The system is partially disassembled now while I do some other testing.) I don’t recall exactly which driver I was using, but it was fairly recent. One major difference is that I was using RHEL5 64-bit rather than Ubuntu.

I would like to move to Ubuntu 9.04 in the future, but can’t install it right at the moment.

I am from RenderStream. We are a full system provider of high performance workstations and clusters for rendering and scientific computing. We are always looking for ideas for better designs and definitely would appreciate your input but for the moment I wanted to let all of you know that we have a full line of workstations and rackmount servers using 4 X GTX295 or Tesla. Our systems are ISO9001 compliant. The OS and CUDA is loaded tested and released after a 24 hour burn-in at 94-degrees (F). What is displayed on our website shows systems with XP64 but we also do Linux and of course dual boots. For cluster computing our systems can use Infiniband. We really want to see people developing programs and not playing hardware tetris; please check us out. Thank you

I am also considering building a multi-GPU system, and I was thinking about possibly pushing it past the 4x295 boundary. With water cooling, you can slim those 295gtx cards down to one PCI slot; so in theory, one could install 7 cards into this P6T7 motherboard. Question: is there any sort of technical limitation that would prevent me from using 14 GPUs for CUDA? Forget about powering and cooling all that - let’s assume that is taken care of.

I certainly haven’t tested 14 GPUs. Good luck finding a BIOS that can handle it, though.

Yeah, this ventures into the realm of diminishing returns. It would be vastly easier and cheaper to build two systems with 4 cards each. If I were a motherboard manufacturer I might try a 7x GTX 295 system as a torture-test of board performance (and a publicity/advertising stunt), but I don’t think I would build such a system to do computation on.

I don’t think any of the current APIC designs for single socket chipsets are going to handle the volume of interrupts that 7 “GPUs” could generate all that well.

This is a good point. While torture-testing my 4xGTX 295 system, I saw a kernel warning every few hours about missed interrupts from the GPUs. (The system was also under extremely high load and virtual memory usage with heavy swapping to an SSD.) It didn’t seem to affect the results, but made me nervous for sure.

Nervous people in “Los Alamos National Laboratory” makes us all nervous ;)

Good point. Although it probably depends on what type of computation you are running and the per-GPU bandwidth needed for that.

Will probably try to set up a test for this in a month or two.