Will this hardware configuration work? First system to evaluate CUDA for scientific computing

I am looking to evaluate CUDA for scientific applications (mostly molecular dynamics, also large data sets involving nonlinear regression). I am an old FORTRAN programmer and have been using MATLAB lately, but I am not experienced with parallel programming, CUDA, etc.

I have a new grad student and want him to get started on CUDA. With grad students, it takes a few years to come up to speed, and I assume the technology will be faster and maybe cheaper then, so the idea is to configure a workable system to get us through the learning curve, then drop bigger $$ when we are able to take advantage of the better system.

I’m wondering how the following would fare? An MSI K9N2 Diamond mb (SLI 780a, 3 PCIe 2.0), AMD Phenom x4 9950 (2.6 GHz), 8 GB DDR-2 RAM 800MHz, 650W or higher power, one or two EVGA GTX 285 2 GB cards, and a cheaper card for display (maybe GT9500 with 1 GB).

The idea is a mix of the increased speed from CUDA with a cost effective system to learn the ropes with. Any help would be greatly appreciated.


Looks reasonable, one of my CUDA workstations (picked up while I was a grad student) is a Phenom 9900 + MSI K9A2 and 790FX chipset. Though if you are building a new AMD system, I would spring for a Phenom II X4 processor which goes in the same motherboard.

The 2.67 GHz Core i7 is also a fantastic chip, especially if you also want to experiment with OpenMP-style programming on the CPU side. The hyperthreading makes the quad-core Core i7 look like 8 “logical” cores, and using 8 threads instead of 4 gets you up to a 50% boost in throughput. In terms of CUDA, the Core i7 systems generally clock in the highest host-to-device and device-to-host bandwidth numbers, if you think you will have to move lots of data to/from the card repeatedly. The AMD systems usually have 10-20% lower DtoH and HtoD bandwidths. Core i7 systems are a little more expensive, so you might prefer the AMD system to start with. However, if/when you ramp up your CUDA usage in the future, the Core i7 can’t be beat.

I have 3 CUDA workstations: Phenom 9900 2.6 GHz, Phenom II X4 3.0 GHz, and a Core i7 2.67 GHz. Any of them would make a good CUDA test workstation (though the ASRock motherboard on the Core i7 still has weird non-CUDA issues).

My only other suggestion would be to definitely verify the number of PCI-Express power plugs on the PSU before you buy it. Each GTX 285 needs two of the 6-pin kind (though most PSUs let you snap off two pins on the 8-pin PCI-E plugs to fit into a 6-pin socket) The 9500 GT requires one 6-pin PCI-Express plug. You might not find 5 PCI-E plugs on a 650W PSU, but I imagine it is pretty common on 750 or 850W PSUs.

  1. If you’re buying a 780a board, don’t buy a discrete GPU.
  2. I would go Core i7 just because the pageable transfer performance is so much better. If you’re porting an app and don’t want to use cudaMallocHost everywhere, you’ll see much better results on transfer performance with an i7 instead of anything else. It’s ~5GB/s instead of ~2GB/s for a properly configured triple channel machine with decently performant DDR3.

Minor nitpick, but the 9500GT (and the 8600GT on which it is based) don’t require any external power. But it is kind of irrelevent anyway, the 780a has got a 9300 class GPU in the north bridge, so an additional display GPU isn’t necessary.

Oops, my bad. I looked up the 9500 GT on Google, but didn’t realize I was looking at a review of an overclocked 9500 GT that requires a 6-pin plug. :)

I would also recommend getting two identical CUDA capable cards. Multi-GPU programming adds an extra layer of complexity and it’s probably best to address this right from the beginning rather than add it in after the fact.