A piece of advice for a 8 GPUs system ? two s1070 to a single motherbroad or not ?

Hi all,

I’m about to buy a supercomputer with 4 TESLA, but I might need more power. In fact 8 GPUs just would be perfect. However, I’m not sure how to correctly do it.

I was thinking about connecting two s1070 to a single motherboard to end up with 8 GPUs but it each card is 8x only :(

Any ideas ? Advices?

Thank you!


Well, it depends - such a machine could be excellent for very compute-bound jobs (full nbody integrations, for instance). Hardware-wise, I believe that people have sometimes had trouble with putting that many GPUs onto one MB. In general though, putting together supercomputer clusters is a non-trivial task (at least, if you want a decently working cluster). To get better advice, you’ll have to be a bit more specific about what you want this cluster to do.

I want to use it for both computation and visualization. The applications are environmental simulations like modelling the sea surface radiance. I have a first version working on a single Quadro I would like to expand .


Four GTX 295s is possible, and would be a little more flexible to deploy than the s1070 setup (and probably cheaper). Be sure to read all the disclaimers, not least because they’re pretty witty.


Using THREE 295GTXs is a lot easier than 4. You don’t need as fancy motherboard or case support, you have a single-width slot left for a small display GPU,
you may be able to get by with a 1000 watt PSU (they get much more expensive after 1000 watts), less likely to have BIOS fights for every OS.

The hardest part for four cards tends to be the case, actually. The standard is 7 slot cutouts, and you need 8, so you get limited really fast.
The Lian-li PC-p80 case seems to be the best solution for 8 slot support… it’s rare, but I can’t name any other stock cases.

This Antec probably. Haven’t seen it IRL but looks like it can be used for 4xdual GPUs.

Also, better be sure you’re not memory bandwidth bound, depending on the board you’re using to put in. Lots of these so-called 3 and 4 way 16x boards can’t pump all 16 lanes to each slot at once. (there are only 36 lanes connecting an Intel IOH header). So they end up using some sort of PCI-E switch instead, and you have to be careful which slots you depend on bandwidth from at the same time, balancing use across PCI-E switches.

Additionally, even if you’re just using 32 lanes… that still saturates the uni-directional QPI bw of 12.8 GB/sec (assuming 2 QPI links). So you have up to an 18GB/sec need assuming uni-directional transfer, and a 12.8 GB/sec pipe to host memory. In real world tests, I was able to push only 10GB/sec to host memory in one direction, confirming this. This was with single socket X58 though- future multi-socket, multi-IOH chipsets may better balance pci-e slots over them.

Regarding the multiple S1070 hookup- the bus isn’t hard divided to 8 lanes per GPU, rather it’s switched. So you can get near 16x performance if only accessing a single gpu/slot.

This is the board I hooked 8 S1070’s to btw:

Caveat: this board, though dubbed “supercomputer”, is still a desktop board and has the stupid bios requirement for a video card to be present. Since it’s PCI-E only, this means one of the 4 x16 slots must be driven down to 8x to share w/ the video card.

I strongly supports Steve’s advice. We have 2 desktops, a simple/cheaper one from intel with 3 GTX295 and a fancier/expensive from AMD with 4 GTX295.

We recently disabled the forth card on the AMD board.

I find it better to only use 3 of them.