4U rackmount case that will fit 4 GTX295 cards Looking for a reasonable rackmount that'll do jut

johanknu · February 24, 2010, 1:29am

Most if not nearly all 4u rackmount cases only have 7 pci slots, are there any with 8 that’ll fit the 4u standard?
Additonally, is the MSI 790FX motherboard suitable for this use?

Cheers,
Johan

SPWorley · February 24, 2010, 2:06am

That’s not a good motherboard for 4 GTX295s. It has one switch to 2 x16 PCIe slots but then only 2 x8 unswitched slots. Since the GTX295s are each sharing bandwidth per slot with their own onboard switch, that hurts even more.

This may be OK if your CUDA apps don’t depend on PCIe transfer speeds, but it will definitely hurt the many apps that do.

When you have that many cards, an i7 (with its HUGE memory bandwidth) is useful since it can spool RAM fast enough to feed multiple GPUs.
The obvious motherboard to get for a 4xGTX295 system would be the Asus Supercomputer. It has 2 x16 switches to give each slot access to full bandwidth. A cheap i7 920 is fine, that still gives you the RAM throughput.

Again, your apps may not need fast memory transfer to and from the GPUs, but it’s such a common bottleneck you need to think about optimizing it.
When you’re spending $2000 in GPUs, it’s not a big deal to spend a couple hundred extra to make sure your CPU side can feed them.

jack · February 24, 2010, 9:36am

seibert did a post a while back about this, where he used Protocase (http://www.protocase.com) to build a custom case for a 4U rackmounted GPU server:

[url=“http://forums.nvidia.com/index.php?showtopic=106040”]http://forums.nvidia.com/index.php?showtopic=106040[/url]

johanknu · February 24, 2010, 10:02am

Hmm, I didn’t find any better mainstream AM3 motherboards, are there any? Has an AM3 CPU I had to use. But seems like Intel might be the way to go. What about the Asrock x58 supercomputer, seems to be a popular choice.

But when it comes to cases, why not just go for this: [url=“Chassis. New Generation X12 Motherboards Supported|Supermicro”]Supermicro SuperBlades, uGPU, AI System, Multi-Node Servers 1400w redundant psu, built for 4x gtx295 cards. 820 usd.

And thanks for the informative reply :)

seibert · February 24, 2010, 5:16pm

The company that designed the custom case I used I believe used ASRock motherboards as well as ASUS. I personally had very bad experience with the ASRock Supercomputer motherboard with RedHat Enterprise Linux, whereas the ASUS Supercomputer motherboard worked wonderfully. The price different is a few hundred dollars, but if you are already spending that much on four GTX 295 cards, I think it is worth it.

And yes, I think for this many GPUs, the Core i7 series with triple channel memory is the way to go.

Yeah, that also looks fine. I wanted something small that could slide onto an existing shelf, and I wanted to test out Protocase as a company for potential future jobs. That custom case was $300 (much less if you want several) and is about as small as you can build a quad-card system without going to liquid cooling. The original designer wanted a computer that would fit in carry-on luggage. My space constraints were not quite so extreme. :)

You can also find much cheaper cases by Antec and Li-Li which have 8-10 slots in the back (no PSU), and therefore could also hold four GTX 295 cards.

johanknu · February 24, 2010, 7:00pm

The company that designed the custom case I used I believe used ASRock motherboards as well as ASUS. I personally had very bad experience with the ASRock Supercomputer motherboard with RedHat Enterprise Linux, whereas the ASUS Supercomputer motherboard worked wonderfully. The price different is a few hundred dollars, but if you are already spending that much on four GTX 295 cards, I think it is worth it.

And yes, I think for this many GPUs, the Core i7 series with triple channel memory is the way to go.

Yeah, that also looks fine. I wanted something small that could slide onto an existing shelf, and I wanted to test out Protocase as a company for potential future jobs. That custom case was $300 (much less if you want several) and is about as small as you can build a quad-card system without going to liquid cooling. The original designer wanted a computer that would fit in carry-on luggage. My space constraints were not quite so extreme. :)

You can also find much cheaper cases by Antec and Li-Li which have 8-10 slots in the back (no PSU), and therefore could also hold four GTX 295 cards.

I see, btw, the 4u supermicro case is 4u compatible. Using the protocase could work, especially if 300 usd, combinet with a 1250w coolermaster psu at 220 a pop. A requirement for me is for the case to be rackmountable :) (Have also considered open racks). But I do think a protocase would cost more like 400$ a pop, at least from the quote I made.

seibert · February 24, 2010, 7:48pm

Ah, ok, I went back in my email and looked at the quote I received. It was $343, so it is pretty close to what you are seeing now. Be warned that the Protocase enclosure is very tight and lacks the amenities of a commercial case.

If you don’t want to spend $820 on the Supermicro case, you could also get this one + the Enermax PSU:

http://www.newegg.com/Product/Product.aspx…N82E16811129058

And purchase a $50 rackmount shelf. :)

(No redundant PSU, but if you really need that, then the Protocase isn’t an option either.)

SPWorley · February 24, 2010, 8:11pm

There’s another thread that we discussed 4x GTX295 design (this is a recurring topic.)
My new build (for 2x Fermi, 2x GTX295, likely) will experiment with the 90 degree rotated Fortress FT02. 8 slots, and the rotated motherboard should significantly help with GPU airflow. (That Fermi system NVIDIA showed off at CES also used a rotated motherboard. Coincidence?)

For 2x GPU cases, I love the P183. But it doesn’t have enough GPU specific airflow so I’ve seen reports of it getting hot with 3 GPUs and really hot with 4. Maybe the P193 would do better, but that FT02 looks most promising.

PeterW · March 1, 2010, 1:42pm

Maybe the TYAN FT48B7025 could be suitable for you, http://www.tyan.com/product_SKU_spec.aspx?..p;SKU=600000164 ; from the pictures it looks like it has 8 slots.

This computer uses the TYAN S7025 motherboard I believe, http://www.tyan.com/product_SKU_spec.aspx?..p;SKU=600000040

which is fitted with two Intel 5520 Tylersburg PCIe hubs, providing 36 PCIe lanes each. This means that all of your 4 GTX295s can communicate at PCIe v 2.0 x16.

Cheers,

peter

johanknu · March 1, 2010, 8:19pm

The Tyan could be useful, depending on its price, did you find any info regarding that?

And the FT02 isn’t 4u?

seibert · March 1, 2010, 9:12pm

People have observed unexpectedly low host-to-device and device-to-host performance on these motherboards:

http://forums.nvidia.com/index.php?showtopic=104243

I don’t know if this a BIOS problem, but I don’t think anyone has reported this being fixed yet.

PeterW · March 2, 2010, 10:57am

Well, not on the 4U barebone, because I asked http://www.workstationspecialist.com/ in the UK to quote me for a complete system based on TYAN S7025 motherboard. The base system (motherboard, case, single CPU, 3GB RAM, 1TB HDD, DVD etc.) was quoted for Â£1,990 in July 2009. They are using Lian-Li cases and I gues all that could be cheaper now.

Hmm, I did not find a reference to FT02, but there is a KFT48 which is 4U.

Cheers,

peter

PeterW · March 2, 2010, 11:23am

Oops. I hope there is a fix in the pipeline somewhere;

I have done preliminary measurements on a X58 based trial system with a single Tesla and the transfer speeds were symmetrical at roughly 5.7GB/s. Our full system would require in essence two such X58 machines in one; that’s why I assumed that a Tyon S7025 could provide exactly that, at least on paper.

Are there any other motherboards that can operate 4 PCIe v2.0 slots at x16? The thread mentioned by seibert talks about EVGA N200 towards the end, would EVGA’s dual-socket 270-GT-W555 beast motherboard do the job?

Cheers,

peter

seibert · March 2, 2010, 3:54pm

Well, if the lower simultaneous bandwidth of four x16 links driven by two N200 chips is fine, you can also use the Asus P6T7 Supercomputer motherboard for an equivalent single socket solution.

PeterW · March 2, 2010, 5:57pm

Maybe some notes on the context are in order. I’ll need to process rougly 1.8GB/s, 16MB every 8.5ms. Data is captured by two custom PCIe x8 cards using up two PCIe slots, and two processes (one for each card) control the data transfer into pinned memory, every 8.5ms. Once that is done, they would wake up two follow up processes. They would transfer the 8MB in pinned memory onto the a Tesla each (two Teslas in total) and launch some kernels. All has to be finished obviously within 8.5ms; the data transfer time eats into the time budget.

Using a single Tesla on a X58 test system, I have measured 5.7GB/s, which means 8MB would transfer in 1.4ms, leaving 7.1ms for processing. If, however, the two Teslas get on each other’s feet on the same PCIe bus, then the data transfer using two Teslas might be compromised. This is what I would like to avoid and this is why the Tyon S7025 looked so appealing.

A lot of data transfer is going on, partly in parallel, partly in series, and I would like to be sure that all can be done on time.

Cheers,

peter

seibert · March 2, 2010, 6:09pm

Maybe some notes on the context are in order. I’ll need to process rougly 1.8GB/s, 16MB every 8.5ms. Data is captured by two custom PCIe x8 cards using up two PCIe slots, and two processes (one for each card) control the data transfer into pinned memory, every 8.5ms. Once that is done, they would wake up two follow up processes. They would transfer the 8MB in pinned memory onto the a Tesla each (two Teslas in total) and launch some kernels. All has to be finished obviously within 8.5ms; the data transfer time eats into the time budget.

Using a single Tesla on a X58 test system, I have measured 5.7GB/s, which means 8MB would transfer in 1.4ms, leaving 7.1ms for processing. If, however, the two Teslas get on each other’s feet on the same PCIe bus, then the data transfer using two Teslas might be compromised. This is what I would like to avoid and this is why the Tyon S7025 looked so appealing.

A lot of data transfer is going on, partly in parallel, partly in series, and I would like to be sure that all can be done on time.

Cheers,

peter

OK, in that case you probably want to avoid two cards sharing a N200 PCI-Express switch, or they might start to step on each other in a potentially non-deterministic way (and also add latency to small transfers). The dual socket EVGA board shares one set of PCI-Express paths between both CPUs and uses N200 chips to fan out x16 to four of the card slots, so I don’t think it will meet your requirement.

The Tylersburg boards, like the Tyan, sound like a perfect match, but you might want to contact the vendor to see if they are aware of the PCI Express bandwidth issues, are fixing the issues, and/or ensure you can return the board if it doesn’t work for you. (If it does work for you, be sure to update us!)

avidday · March 2, 2010, 6:13pm

It isn’t just the PCI-e bus you have to worry about. I understand QPI link is only capable of 10Gb/s unidirectional transfer, so obtaining and maintaining correct processor affinity is going to be critical, otherwise you could find yourself with another bottleneck on your hands. I have no experience with the sort of application, but I have done quite a lot of hard real time work before for control and instrumentation, and I must say that my gut feeling is that you are being pretty ambitious, especially if you planning on using a standard operating system.

PeterW · March 2, 2010, 10:10pm

I see the potential problems here. Actually I wanted to run everything quite in a symmetric fashion, i.e. one DAQ card + one Tesla card on one PCIe hub with the two respective processes (or threads) running on the CPU attached to that PCIe hub. And the same for the second set of DAQ card, Tesla and etc. Processor affinity will be quite important as far as threads and memory allocation is concerned. I have been using Linux, and real-time linux variants for quite some time now, I have used Linux for the first trials and I am planning to use Linux, perhaps with one or the other kernel patch. I hope that I will be able to control these aspects in a fine grained manner. The challenge is still ahead.

I’ll keep the list posted on my progress, or more likely on my issues.

Cheers,

peter

SPWorley · March 4, 2010, 8:49pm

Just got my FT02 case in preparation for my new supercomputer build when Fermi is released.

Bad news… the FT02 has only 7 slot cutouts, not 8! This is strange because it’s the same interior frame as the 8-slot Raven02, and I didn’t recheck this. It looks like a last double-wide card will still easily fit but its vent would go (unobstructed) into the case, not out the backplane. This may still be OK, I’ll see.

Otherwise, it’s clear from inspection that this case will really really do well with the multi-GPU issues… huge fans blowing right at the GPUs, and the rotated motherboard moving the heat UP and not sideways. I think it’s going to work well. We’ll see in 3 weeks when the Fermis come out and I can finish the build.

MichelKitware · March 18, 2010, 6:58pm

I would be interested to find out about all of your experience with these frames and motherboards.

I was sent a link to this discussion by Allan MacKinnon through the GPGPU group in LinkedIn.

I’m trying to build a neurosurgery simulator on a budget. I was first thinking of 4-7 GTX 295s, but am thinking of waiting for the GTX 470 and 480 to come out, while still considering a few C2050s.

This simulator will combine multi-grid finite elements with GPU accelerated Total Lagrangian Explicit Dynamics FE, and there may be some communication involved between the different levels of resolution. I’m hoping for a fairly sparse representation of the brain and a subvolume of it at a finer level: 1000 elements or so in each.

I understand that there may be some advantages to using GDDR5 communication compared to GDDR3.

Someone has also suggested that it is important that all PCIe slots be linked by Northbridge. Can anyone comment on these boards?

With regards to cooling, Kitware is discouraging me from using water cooling, or at least encouraging me to buy ready-assembled reliable hardware, which is almost saying the same thing. The FT02 looks promising from a cooling standpoint. How many GPUs have people been able to use on this machine? Any software reliability issues?

Thanks for your kind comments.

Michel

Topic		Replies	Views
Dual CPU AM3 motherboard for 4 Tesla C1060s? CUDA Programming and Performance	33	11579	April 23, 2010
The fastest platform of GPU computing CUDA Programming and Performance	38	40283	January 19, 2010
Best PSU solution for 4 x GTX 295 CUDA Programming and Performance	51	70057	April 15, 2010
Server Motherboards for mulit-GPU systems (&Fermi) CUDA Programming and Performance	26	21078	November 12, 2009
New 285 and 295 cards CUDA Programming and Performance	52	47720	June 3, 2009
CUDA development cluster (using old filing cabinet!) Advice needed on hardware specification CUDA Programming and Performance	38	10340	October 4, 2010
Shopping-list for Cuda GPGPU System in 800-1000 euro price-range Goal: A 'budget' GTX 470 (F CUDA Programming and Performance	59	11995	April 15, 2010
Building your own personal supercomputer What we did and what problems we had CUDA Programming and Performance	23	11539	May 6, 2010
Hardware Recommendations Recs for hardware for GTX 275 or 285 on Linux CUDA Programming and Performance	20	24170	January 13, 2010
Manifold Custom Case Rev. 2 Success! 8 CUDA devices in your carry-on luggage CUDA Programming and Performance	9	14674	August 20, 2015

4U rackmount case that will fit 4 GTX295 cards Looking for a reasonable rackmount that'll do jut

Related topics