I wanted to report that today I successfully built one of the Manifold Custom Revision 2 cases described here:
[url=“http://forums.nvidia.com/index.php?s=&showtopic=89891&view=findpost&p=539172”]http://forums.nvidia.com/index.php?s=&...st&p=539172[/url]
I followed much of Dimitri Rotow’s parts list with a few deviations:
- Enermax Galaxy 1250W PSU (in case my employer is reading this: yes, this power supply is UL-listed. Safety first! :) )
- Intel Core i7 920 (2.67 GHz)
- 4x EVGA GTX 295 cards (the new single PCB variety)
- Manifold Custom Case Rev 2 from Protocase.com
- 3x Thermaltake A2018 120mm fans (blue LED and various speed control options)
- Rosewill RFT-120 120mm fan filter
- Patriot Viper 12 GB RAM PC-10666 kit (6x 2GB modules)
- Intel X25-M (G1) 80 GB Solid State Disk
- Asus P6T7 WS Supercomputer motherboard
I have no affiliation with Manifold, so take this as the perspective of an outsider working with the Manifold case. (I do have lots of experience assembling computers, so I’ll mostly focus on the unique features of this system.)
Although I want to write a much more complete document in the future (with photos!), here are some of my initial impressions before I forget them. Many of these things are in the “Building an E Box” PDF, but I’m repeating them here because I didn’t appreciate their importance when I read through the document the first time. You should not take this post as a substitute for reading the Manifold document, however. It’s very informative!
======
The Case:
-
Protocase offers truly amazing service. I highly recommend that you browse around the documentation PDFs on their website. You’ll learn a lot about enclosure design and working with sheet metal. Protocase also recently popped up in the news as the manufacturer of the Backblaze Storage Pod, which holds 45 disks in a $750 4U rackmount enclosure.
-
Ordering the case was pretty easy, though the sales rep I emailed for a quote initially did not know what the “Manifold Custom Case Rev 2” was. I pointed her to Dmitri Rotow’s forum post above, and then in a day I had a quote in hand. $343 for one case, dropping to $204 each for an order of 10.
-
My case shipped from Nova Scotia, but FedEx International shipping was included in the above prices, so I didn’t realize it until I got the tracking number.
-
The case build quality is impressive, especially with the powder coat on all surfaces. (I went for leaf green, just because I was tired of black and beige.) Everyone in the office spent a few minutes admiring the case before it was whisked off to the lab.
======
Assembly:
-
The case comes with only the screws required to hold the sheet metal together, but none for mounting the computer parts. The Thermaltake fans come with suitable screws, nuts and washers, and the Enermax power supply has its own screws as well. You will need to supply 9 screws for the motherboard, 4 screws to hold down the graphics cards, and whatever screws are required to hold down the disks you install. (I didn’t use any hard disk screws, but more on that later.) Motherboard standoffs are built into the case bottom. If you have a bag of miscellaneous computer screws, you should be in good shape.
-
There is not a wasted cubic centimeter in this case! You need to read the assembly order in the Manifold documentation. I made my life a little difficult by installing all four GTX 295 cards before installing the SATA cable. Fortunately the P6T7 has angled connectors, so you can get under the graphics card and finesse the cable in if you have thin fingers. (or forceps)
-
Getting the GTX 295 cards installed is a rather harrowing game of 3D Tetris. As the documentation states, you do need to flex the back of the case gently to get the card backplate lip around the obstructions. Everything springs back just fine, though.
-
The X25-M is a 2.5" form factor, which means the screw holes in the case lid are not spaced appropriately to mount it directly. Instead my plan was to put the drive into a Icy Dock 2.5" to 3.5" SATA converter. I’ve used this enclosure before, and it was great. However, it seems to be a little longer than a normal 3.5" drive and collided with the PSU when oriented with the cables facing away from the PSU. Turning it around just barely worked if you flexed the cables at a hard angle going into the connector.
-
The manual isn’t kidding about needing standoffs for the hard drive. It is impossible to get the connectors in (especially since the Enermax SATA power angled connectors bend the wrong way) if the drive is flush. It turns out that if you have some motherboard standoffs laying around, those work in a pinch instead of nylon ones. (Nylon would provide better vibration isolation, but whatever. :) )
-
As it happened, my Icy Dock enclosure was defective, so I finally decided to just velcro the X25-M (which is very small and light) against the back of the case through the 40 mm optional fan holes. This works really well, and I would highly recommend doing this if you use a SSD instead of a rotating disk.
-
You can reach the power switch jumper on the P6T7 from the front-left corner of the case if you crack the lid. This is handy if you need to short the power switch jumper horizontally with a screwdriver (be careful!) because you forgot to set the BIOS to auto power-on before installing everything. :)
======
Operations:
-
For other reasons, I had to install Scientific Linux 5.3 (this is a RHEL 5.3 rebuild, like Centos). I turned off the Marvell SAS controller and set the SATA controller to AHCI mode in the BIOS.
-
RHEL 5.3 and the CUDA 2.3 driver had no problem recognizing all 8 devices in the P6T7 motherboard. There is one BIOS update on the Asus website that I did not apply since everything worked first time.
-
There was some unusual CPU clock ramping behavior initially. /proc/cpuinfo said the cores were stuck at 1.6 GHz, even under load. Strangely, top showed the 8 single-threaded CPU-bound processes each using 250% CPU, so clearly the system was in some kind of confused superposition of max and idle clock rate. I finally just forced the CPU to run at full clock all the time in the BIOS. (Perhaps the BIOS update helps here. I haven’t tried it.)
-
Idle, the system draws about 600W at the plug. (Note this is with the clock rate forced to max.)
-
As you activate more CUDA devices, the power usage ramps up. I observed a maximum of 1100W, but my test jobs might not have loaded every single CUDA device simultaneously. Consider that value a lower bound on the power usage. :)
-
Power usage immediately ramps down when the card is idle. When all the jobs finished, the power draw was back at 600W.
-
I’m still trying to figure out how to monitor the temperature sensors in the system. sensor-detect in the lm_sensors package was able to detect the ADT7473 chips on the GTX 295 cards. (There appears to be one per card, not one per device.) However, the kernel shipped with RHEL 5.3 did not have a driver for this chip. It does appear in later Linux kernels, and it looks like the driver might be backported to the RHEL 5.4 kernel (which has the same version as 5.3, but RedHat modifies the stock kernel quite a bit). The nvidia-smi tool does not seem to be able to read temperatures from a GTX 295.
-
The back of the computer gets really warm when the GPUs are operating, hence my interest in monitoring the temperature more closely. I’ve run GPUs this hot for extended periods of time, so I’m not immediately concerned, but it is worth keeping an eye on.
======
Misc Hardware:
-
The Asus P6T7 is a really fantastic motherboard! I also have an ASRock Supercomputer motherboard, and it has not impressed me. I found the ASRock BIOS buggy, hard to update without a floppy drive or Windows, and even after finally updating it, only slightly less buggy. (In RHEL5.3, it throws spurious ATA timeout errors constantly on ports with no devices attached. There is also really weird BIOS interaction between the on-board ethernet and the firewire port for some inexplicable reason.) In contrast, the Asus motherboard worked nearly flawlessly (not sure whose fault the CPU clock rate issues are), and is really well built. The P6T7 costs more, but if you have the budget, it is well worth it.
-
This isn’t CUDA-related, but the X25-M is a mind-blowing device. I’m a little late to the SSD party, but the performance improvement over magnetic disk is amazing. Even when I deliberately oversubscribed the virtual memory, the system stayed responsive (though a little sluggish) under constant swapping to the SSD. Running out of real memory is no longer quite the EPIC FAIL that it used to be with rotating disk.
======
Anyway, sorry for the length of the post. As I mentioned, I want to study the power and heat profile of this system so I can decide under what load it is suitable for 24/7 operation. (Shortening part lifetimes is acceptable, but locking up and getting wrong answers is not.) I will probably be switching to Ubuntu 9.04 in order to get the lm_sensors support. Among other things, I will also be investigating failure modes (like a dead case fan) to see what sort of automated shutdown settings are required. My goal is graceful failure and recovery, rather than 100% uptime. At ~$3800 each, this system could completely break every year, and still save us tons of money. :)
I’d like to thank Dmitri Rotow and Manifold.com for publishing their case design! It’s helped give me a big head-start on my project (and introduced me to the exciting world of custom fabrication).