Best motherboard for 3+ Teslas ? ASUS P6T7 WS SuperComputer or Tyan S7025 ?

The company I am working for is planning to build a supercomputer using at least 2 Teslas (preferably 3). We have most of the components figured out already, but we are still debating what is the best motherboard to use. We have narrowed it down to the ASUS P6T7 WS SuperComputer or the Tyan S7025.

I believe that the Tyan will support up to 4 Teslas, however I think that this will only work well in Linux, as the board has Aspeed video as opposed to Nvidia. I don’t think the ASUS board has onboard video at all, however I have been reading that it has 2 NF200 chips, which, if I understand correctly, are supposed to increase the maximum available PCI-e lanes on the board from 36 to 64, however I don’t know if these chips have anything to do with video. I am not sure how many PCI-e lanes the Tyan board has, so I would appreciate it if anyone could tell me that.

Further, I know that the ASUS board has 7 PCI-e slots, however from what I have been hearing it is not possible to fit more than 3 double-width cards due to the spacing of the slots. My question is whether or not it is possible to fit 3 double-width cards (i.e. Teslas) and 2 single-width cards (a display card and a NIC) in the PCI-e slots.

I am also curious as to how much of a difference there will be in terms of the speed of the RAM, as it seems the Tyan board only supports up to DDR3 1333 RAM whereas the ASUS board supports up to DDR3 2000 RAM.

Further, I was wondering whether or not it would be possible to put Core i7 processors in the Tyan board rather than Xeon 5500 series processors. The Tyan web page says that the Xeon 5500 series are supported, however it’s an LGA 1366 board with 130W TDP, so it seems like theoretically it should be able to support Core i7 processors, provided the BIOS supports them.

If anyone has any information regarding these questions, your input would be appreciated.

Thanks,

-Crispy

A PC with a couple of GPUs is not the same as a Cray or Blue Gene or any machine like that. The term “supercomputer”, as used by x86 vendors, is puffery and should not be repeated by non-marketing types.

Check out the photos at the bottom of the page in this forum thread. But they are GTX 295s (single PCB version, water-cooled) not Teslas.

I can try and shed some light on the quick issues here. The biggest difference between the two boards is that the Tyan is dual socket. As far as I know, that means you have to use Xeon not i7, because i7 doesn’t support multiple CPUs in one system (it’s an Intel market division thing that has to do with the extra QPI between chips, I believe). Both boards have physical 16x slots in the 1,3,5, and 7 positions, but unless you have a case that has 8 or more slots (there are a few around) you can’t put a double-wide card in the bottom slot (#7).

I’m not sure about the Tyan, but I remember that the P6T7 has four 16x electrical connections from the 1,3,5,7 slots. Whether either board uses NF200 chips to multiplex two or more of those electrical connections down to one 16x link to the CPU(s) and main memory, I’m not sure. That could bottleneck you’re system, depending entirely on the code being run.

I’m know you can’t have another graphics driver installed in Windows when you run CUDA, but you may be able to disable the onboard Aspeed graphics in the Tyan BIOS to avoid that problem, if you have another NVIDIA card installed (geforce or quadro, not tesla). No guarantee there.

If you want three Teslas in the P6T7, you can either put them in slots 1, 3, and 5, and have slot 7 open for a display card (a single slot card would let you fit in a 7 slot case) or you could use 3,5,7 for the Teslas in an 8+ slot case and have 1 and 2 open for a single slot display and a NIC. Are you sure you need an external NIC? The onboard gigabit ethernet controllers should work fine, unless you really need fiberchannel or infiniband or something. I ask because I’m not sure if adding a card to slot 2 will take PCIe lanes from one of the Teslas.

System memory speed may or may not matter, and the Tyan board can probably reach 2000 MHz anyway, because the memory controllers are on the CPU with LGA1366 chips, not on the motherboard. Anything over 1333 just isn’t supported by Intel, and so even Asus usually lists it as supported through overclocking only. Most server board customers aren’t going to be looking for that.

Hope that helps some.

The P6T7 is equipped with 2x nf200. Slot 7 has a fixed 16 lanes. Slots 1,3 &5 run at 16 lanes if slots 2,4 & 6 are not used.

Does anybody here have experience with performance using nf200 slots versus 16 lanes directly from the X58? I’m somewhat worried about bandwidth limitation with two cards running through one nf200, and with latency issues with one or two cards. Is there another motherboard chipset option (i.e. non-Intel) that provides 48 or 64+ PCIe lanes for 3 or 4 GPUs? Would that much bandwidth saturate any main memory bus anyway?

There aren’t any PC chipsets which offer more than 38 total PCI-e lanes without resorting to outboard PCI express asic like the nf200. The Intel X58 supports 36 total lanes, and the AMD 790FX supports 32 lanes for GPU lanes + an extra 6 lanes for the south bridge.

I can give/post some numbers next week for you when I’m back in the office. I have a direct comparison between 4x GTX295 installed on an ASRock X58 board with fixed 8 lanes per slot bandwidth and the P6T7, full 16 lanes bandwidth per slot, but using the nf200’s. The bottom line is this. Figure that the nf200 does add a little latency but that for the moment it’s not really significant. Figure that if you had 16 lanes and you split them directly between 2 slots you’d have a fixed 8 lanes per slot. Figure that by plumbing the 16 lanes from the X58 into the nf200, you have them dynamically available to either of 2 slots. So if one card can use more than 8 lanes it will do so. You can’t get out more than you put in, so if one slot tries to use the full 16 lane bandwidth, the other can’t. Call it switching, interleaving, call it what you will, but you can’t get more out than you put in to start with. But here’s the deal… If you are transferring large amounts of data to/from the cards, (certainly for my application), an improvement is seen by having them routed via the nf200, than by fixed 8 lane bandwidth direct from the NB. Where it doesn’t work so well is where you have contention. ie. both slots try to max out the lanes via the nf200 at the same time. I suspect this will depend very much on the specific application, how much data you need to shift to/from the card and whether you are going to bottleneck with cards trying to use the full bandwidth at the same time. The numbers for my app are better via the nf200’s than fixed 8 lane bandwidth. But if you don’t shift much data to/from the cards, I suspect the latency introduced by the nf200 might start to be significant.

I hope that makes sense. I’m not really an expert nor a very good technical writer so I hope what I’m trying to say is understandable.

But what about the S7025 that was mentioned at the start of the thread. Don’t the dual Xeon boards have 2 NB’s. ie. 2x 5520 chips. The Tyan documentation is not so good, but maybe someone knows how many lanes each 5520 has and how they are routed to the PCIE slots? It could be the case that the 4x double spaced 16x (electrical) PCIE slots actually have 16 physical lanes, no? I’m not an expert, does anyone know?

That makes enough sense for me. The application that I’m involved with is using the bi-conjugate gradient method to solve a large, sparse system of linear equations. The programmers are using the GPUs to accelerate two matrix-vector multiplications, where the second matrix is the transpose of the first. Right now it’s all programmed for one GPU, but one of the future ideas is to split the two multiplications between two cards. The math is fast enough that just moving the data onto the card is taking almost as much time as the computation itself, so we’re worried about splitting any bandwidth between cards. I know that CUDA doesn’t support moving data over the SLI connector, but it seems like that might let you put identical data on two cards sharing a nf200, if the full 16x could be feed to the first card, and the SLI connector bandwidth (if it’s large enough) used to copy the data to the second. I’m just a mechanical engineer though, so please let me know if this is a dumb idea before I bring it up with the computer science people.

As I understand it, the SLI link has a fraction of the bandwidth of the PCI-e bus, and it isn’t really designed for ad-hoc data transfer anyway.

As for the server chipsets, the 5520 is still a single northbridge design, just with two QPI links instead of one. It still has the same 36 PCI-e 2.0 lanes as the X58 does.

OK, I understand that, but there are 2x 5520 chips on the Tyan S7025 board. Therefore 2x 36 PCIE lanes available?

I can’t help with that. The only Tyan 5520 board I have seen only had a single north bridge.

No problem. I’ve asked Tyan to clarify how many lanes service each of the x16 PCIE slots on the S7025 board with dual 5520 IOH’s.

Tyan have confirmed that each of the 4 PCIE x16 slots on the S7025 has the full compliment of 16 lanes allocated.

Page 15 of the manual …
External Media