First post, looked all over for a FAQ or beginners guide but didn’t find anything that answered my questions…apologies if I’ve missed some obvious post and/or asking stupid questions.
Basically thanks to the credit crunch I suddenly have plenty of free time on my hands (until recently quant in investment bank), so I’ve hatched a plan to build a low cost linux box and get up to speed with Cuda - focussing on HPC applications for finance.
To be honest I don’t really know where to start with hardware - I’m going to go with a 9 series GPU as I want double precision, but no idea what motherboard to get. In fact I dont even know how many cards I need - if you’re using the GPU for CUDA, then what does the graphics? Should I go with e.g. a mobo with onboard graphics to do the graphics and then the PCIX card is dedicated to Cuda?
For starters what about ASUS P5N7A-VM or EVGA 780i? Should I get a mobo with lots of PCIX slots? Should I go for a quad core or dual core ok (I read something about 1 core per GPU…). 32 bit or 64?
All advice gratefully received.
The GeForce 8 and 9 series are single precision only. It is the GTX 2xx series (the 260 and 280 for now) which carry double precision support, although at a much reduced performance level compared to single precision. Each multiprocessor in the GPU (30 in the GTX 280) carries 8 single precision units and 1 double precision unit, so you should keep that 8:1 factor in mind.
If you only have one GPU, then the GPU processes on-screen graphics between CUDA kernel calls. For this reason, there is a watchdog timer enforced that prevents a single CUDA function call from running longer than 5 seconds. Note that if you have two graphics cards, the watchdog is not be enforced on the second card. (There have been driver bugs and OS limitations which adjust that statement, but on Linux I believe it is true.)
Since you are posting in the CUDA on Linux forum, I’ll assume that’s your target OS. On Linux, you can also avoid the watchdog with one card if you just don’t start up X. This is how many people who build dedicated CUDA workstations use it. In practice, most kernel calls are so short, the watchdog is not a huge consideration.
My personal suggestion for what to build would be a system which could take two GPUs, but then only install one in it while you learn CUDA. Then if thinks look good, you can easily drop another card and experiment with multi-GPU programming. The primary constraint to installing multiple GPUs is physical space and power. A high-end card, like the GTX 280, can draw almost 200W when in use. It also requires both 6 pin and 8 pin PCI-E power connectors, so spec the PSU accordingly.
Without a lot more planning, it is hard to put more than two of the high-end GPUs into a standard workstation since each card is two PCI-E slots wide and draws so much power. It is reasonable to assume that’s the upper limit for what you could put into a cheap workstation. Therefore, you’d probably want a computer with two PCI-Express 2.0 x16 (physical and electrical) slots that are separated by at least one other slot.
The suggestion that you have one CPU core per GPU comes from the fact that the CUDA driver spins in a loop polling the card to determine when it is finished with a kernel. (Kernel calls are asynchronous, but if you then do a cudaMemcpy(), the driver will block while the kernel finishes.) The hot spin is designed to minimize latency in detecting a finished kernel. So I think a dual-core CPU is fine, and a quad-core could be handy if you plan to do simultaneous CPU and GPU calculations.
And, again, since you mention Linux, I can say from personal experience that 64-bit support for CUDA is excellent and you should use one of the recommended 64-bit Linux distributions on the driver download page. (Note that the RHEL drivers should work fine on one of the free RHEL rebuild clones, like Centos or Scientific Linux.) Just be sure to pick the same Linux distribution version that is listed. Newer distribution versions will sometimes have gcc incompatibilities with the CUDA toolkit, and you don’t need the headache of fighting that while you’re learning CUDA.
I don’t know anything about current Intel motherboards, so I can’t speak to that question. Hopefully someone else can make suggestions there.
Just to add something … PCI-X and PCI-Express are very different. PCI-Express is what you’re looking for with graphics cards (PCI-Express x16 2.0 to be exact). Just didn’t want you to go looking for a motherboard with PCI-X and be disappointed when your card doesn’t fit.
Also, it would be pretty helpful if you would define “low-cost”, as this can mean many different things to many different people.
Cuda dev box up and running…MSI SLI mobo with one Asus 9500GT card for the moment. OS is ubuntu 7.10 server edition.
Installed driver, toolkit and SDK…having problems compiling examples - seems to be some missing headers. The search facility on these forums doesnt seem to work for some reason? Will post issues in a separate thread
I also would like build CUDA box, but my demands some differs.
For our purpose we choose to use 280gtx (for our issue we needed double precision), and may be two cards in one box.
The main question it’s power of PSU, chassis and cooling system for this box.
As i suppose, PSU must be no less 1KW power, chassis must be full tower, but i don’t really know what do with almost 750W of heat.
hi, another quick question…
the SLI mobo came with this connector thing to connect 2 GPUs together, presumably for SLI.
I’m assuming when I get a second GPU, it will be treated as an independent device, but I wondered if I connect the two together with the connector, can the two devices be seen by Cuda as a single device? e.g. if I got a 9600GT as well as my 9500GT could these be seen as a single 96 core device?
Yes, it is true that the manual says that identical devices are required, but in practice I have never seen this problem. For many months I used an 8800 GTX and a GT200 card in the same computer. Other people in the forum also have been successful mixing cards. (If someone has bumped into an incompatibility, that would be very interesting to know.)
Anybody considering water cooling might want to take a look at my first attempt…see the link below.
Sure I put a lot of grant money in it ($4.5k, $1.6k in gtx280s alone) but I’m very happy with what I’ve learned about PCIe, bandwidth, and multi-gpu setups. I think I get a very good performance so far using SDK examples…
CUDA should be great for financial applications, I can think of some optimization problems that are easily parallelizable… good luck!