4x 9800 GX2 possible? Quad 9800GX2 specification

afaulconbridge · May 1, 2008, 9:04pm

I was wondering if anyone has tried builidng a machine for CUDA using four 9800GX2 cards. This should give a total 8 CUDA devices with 128 stream processors in each in one machine. Youd probably need 8 CPU cores, an extream PSU, a special case, plus a PCI-E riser or two.
The hardware I am looking at for this is as follows:

Motherboard: Intel Skulltrail [url=“http://www.scan.co.uk/products/productinfo.asp?WebProductID=764124”]http://www.scan.co.uk/products/productinfo...roductID=764124[/url]
This motherboard is one of the few with both dual-socket and four 16x PCI-slots (though really its two 16x, which would be 8x per slot with four cards, or 4x per cuda device).

CPU: two quad-core Xeons [url=“http://www.scan.co.uk/Products/ProductInfo.asp?WebProductID=751151”]http://www.scan.co.uk/Products/ProductInfo...roductID=751151[/url]
The Skulltrail is socket 771, so xeons should fit and cut down on excessive cost (compared to the recomened CPUs for a skulltrail board).

PSU: Thermaltake 1500W [url=“http://www.overclockers.co.uk/showproduct.php?prodid=CA-067-TT&groupid=701&catid=123&subcat=”]http://www.overclockers.co.uk/showproduct....tid=123&subcat=[/url]
The only PSU I could find with both 4 6-pin and 4 8-pin connectors.

Case: Thermaltake super-tower
[url=“http://www.overclockers.co.uk/showproduct.php?prodid=CA-073-TT”]http://www.overclockers.co.uk/showproduct....rodid=CA-073-TT[/url]
One of the few 10-PCI slot cases. There are some others, but this is one of the cheaper ones.

PCI-E riser(s): [url=“Ably-Tech Corp. == View Products”]Ably-Tech Corp. == View Products
Because the lowest two PCI-e sockets are adjacent, they need to be spread out to fit two double-with 9800 GX2 cards in. Hopefully, this should do the job, but an additional riser for the upper one may be needed to allow this to fit on the lower slot.

Is there anything else I am missing here? Has anyone built a system like this before? It would sound like a small helicopter, but this would be put in a dedicated server room somewhere out of sight.

Qazax · May 1, 2008, 10:18pm

no, you cant have more than 2 gx2’s, i may be wrong and 3 may be possible just for cuda, but 4 deffinately isnt. if you look at the motherboard, the bottom 2 pci-e slots are too close together to fit a gx2(double slot card) in.

if you really want a lot of power and have the money, look into the nvidia tesla systems, you can buy rack mountable stackable systems and they = uber pwnage External Image

seibert · May 1, 2008, 11:37pm

This is what the PCI-Express riser cards are supposed to take care of, although it will be challenging for sure to pack that all in.

Qazax · May 1, 2008, 11:54pm

still doubt it will work and ur better off with tesla, but by all means try if ur up for it.

skippy1729 · May 2, 2008, 2:58am

According to page 47 of the IntelÂ® Desktop Board

D5400XS

Technical Product Specification:

"PCI Express x16: four PCI Express x16 connectors supporting simultaneous

transfer speeds up to 4 GB/sec of peak bandwidth per direction and up to 8 GB/sec

concurrent bandwidth."

As PCIe version 1 is 250 MB per channel per direction this would indicate that all four of these have 16 electrical channels. This would be 8x per GPU.

I have also been looking at the ASUS L1N64-SLI WS/B which has four physical x16 slots with the same slot spacing format as the Skulltrail. It will take dual quad Opterons. The channels on this board are only x8, x16, x8, x16 but it is prices much lower than the Skulltrail. There are a lot of negative reviews on Newegg, mostly concerning DOA boards (sometimes with DOA replacements). The discussions of the Skulltrail on the overclocking forums don’t mention any quality control problems. However people with the ASUS seem happy with it once they get one that works.

Are you planning on water cooling the GPU’s?

skippy

mikemayo21 · May 2, 2008, 5:23am

Does tesla work well for games?

skippy1729 · May 2, 2008, 6:39am

Tesla has NO video output. Computational device only.

Skippy

mikemayo21 · May 2, 2008, 6:52am

oh…

afaulconbridge · May 2, 2008, 9:00am

A “Lot of money” is the right phrase! For Tesla, its on the order of Â£2500 per pair of cards (equivalent to one 9800GX2, which is about Â£350). This entire system costs about Â£2700. So yes, it could be done with Tesla, but for the price of one equivalent Tesla system, one could buy 2-3 of these systems.

Plus, AFAIK Tesla cards only support CUDA 1.0 but 9800GX2s support 1.1. The main difference is atomic operations, which could be useful.

afaulconbridge · May 2, 2008, 9:02am

This build wouldn’t work for games. At the moment, the drivers only support quad SLI, and each 9800GX2 is really two cards sandwiched together, so two 9800 GX2s would be the best. So four would not be any better; this is strictly for use as a high-end CUDA system.

afaulconbridge · May 2, 2008, 9:16am

That Asus board looks interesting; its good to know there are alternatives available.

I wasn’t planning on water cooling the GPUs. There are water cooled 9800GX2s arounds, but the ones Ive seen are about twice as expensive, so that makes the entire system twice as much with pump etc. In my situation, I can move the system to a remote server room, so the noise is not a problem and I can put as many extra fans on it as can fit (which in a case with 7 5.25" slots on the front is quite a lot).

jlehtone · May 2, 2008, 9:17am

One Tesla has 1.5 GB or RAM, one GPU in GX2 has about 0.5 GB. Does not explain the price difference, but might make a difference for some algorithms.

Qazax · May 2, 2008, 10:36am

im not sure so im just skeptical, but are you guys SURE that the drivers support 4 gx2’s? 8 gpu’s? if you are then fair enough and iv learned something new, but id be suprised if nvidia mande sure that 4 gx2’s are supported, given that you cant actually fit 4 on a motherboard without special risers etc etc…

afaulconbridge · May 2, 2008, 11:51am

No, I’m not sure at all; I think the only way to be sure would be to try it. However, I don’t see any particular reason why it shouldn’t work. This is a system for computation, not for graphics or gaming, so it doesn’t need SLI or even multiple monitors There are some reviews that have photographs with 10 monitors connected to a single PC, so I believe it should work.

If it does not work, then most of the hardware could be split into two systems with two cards in each. Sot here wouldnt much of a waste in that case.

Qazax · May 2, 2008, 12:33pm

those probably use systems like the matrox systems, to have a large number of displays, probably only have 2 graphics cards.

id like to be proved wrong tho :)

skippy1729 · May 3, 2008, 11:32pm

Another possibility might be three 9800GX2 and one 8800GTS 512. Supposedly the 9800GX2 internally is two 8800GTS 512 cards. All the software should (might) recognize this as seven identical GPU’s. This might be easier mechanically than the riser/cable assembly.

If you build one of these multi GPU let us know how it works out.

Skippy

moulik · May 4, 2008, 2:24pm

I would love to see the algorithm which you are planning to run on this machine…

DrYak · May 5, 2008, 12:57pm

Technically you don’t even need to have a graphical interface running or even installed to run CUDA code (that’s what I’m doing I have a CentOS server without Xorg installed, equiped with an old PCI card for text terminal and a 9800 GTX that is solely used for running the CUDA code). So even if the GUI crashes with too many cards (as a reference, see answers to that slashdot thread or search “multiseat X” keywords - lots of local multiusers on X window server are reported but some exhibit instabilities with too many graphic cards) that won’t pose any actual problem in your situation.

As for CUDA, according to the documentation, the API simply considers each GPU as a separate CUDA device, so with your installation you would simply end up having 8 different device to which you could assign 8 different CUDA process. (And the 2x quad core should provide largely enough horsepower to run the CPU side of those task). The main problem I see is memory and bandwidth limitation : GX2 have only 512MB per card, and 8 CUDA task are going to put a significant stress on the bus when each transfers data between host and device. I don’t know, maybe AMD processors fare better as they have onboard memory controller and their dual channel can be decoupled (thus a dual proc system features 4 independent simultaneous access to the memory).

seibert · May 5, 2008, 1:53pm

Yeah, this kind of 8 device configuration only makes sense for kernel in which all of the host<->device transfer can be done up front, and then minimal host communication is needed per iteration. There are many problems where this is possible.

I wonder what kernel startup bottlenecks there are. It’s possible that even with 8 CPU cores, there might be some OS-level contention which prevents short kernels from being issued to all 8 devices simultaneously. Would be interesting to benchmark for sure…

skippy1729 · May 6, 2008, 7:28am

Looking at this again. The GX2’s are double wide but this does not mean that they are usable on two PCIe slots which are double spaced. The sides of the any two adjacent cards will be up against each other, the outside air intake is through the back of the card and the exhaust is through the sides. Have a fire extinguisher handy. Water cooling will not help. The cooling block is sandwiched between the two boards and the overall width of the assembly is not reduced, in fact, it may be slightly increased. Also the water blocks I have seen have the hose connections on the side, making double spacing impossible.

I cannot see any combination of risers that would allow four GX2’s on the skulltrail or any other board. Three looks possible: Card 1 in slot 1; Card 2 and riser (which also offsets by one slot) in slot 3; Card 3 and riser/offset in slot 6. Custom enclosure for sure.

There are a variety of industrial PCIe backplanes on the market. Perhaps one of these could work for you. However, when you get out of commodity hardware costs tend to skyrocket. What about a small cluster with MPI? Single quad CPU’s, two 9800GX2’s per node, motherboard with 2 (or more) gigabit ethernet ports.

Perhaps I don’t have the right picture of what you are trying to do. Let me know.

Skippy