What hardware to get?

I’m aiming at buying a Tesla C1060 and put together a stand-alone computer for the sole purpose of running computations. I figure I’ll need the C1060 for the 4 GB memory as my computations use a lot of memory (I now run them using CPUs). I’m just starting out with CUDA so I’ve got a lot to learn.

I’m curious, what CPU (speed, multi-core), RAM (speed/size) and motherboard (speed, components) do I need to get to have optimal computational performance of the Tesla/GPU? In case I need to do a lot of round trips to and from the GPU versus rarely? If I’m able to run all computations on the GPU, does it even matter what CPU and RAM I use as long as I got a PCIe slot? I want to have the best performance without any “overkill” hardware.

If you run everything on the GPU, the CPU matters not at all. I haven’t had a chance to play with the new G200 chips yet, but my 8800 GTX system is running on a ~4 year old CPU and I don’t notice a bit of performance difference in other systems with modern CPUs.

Well, I shouldn’t say not at all: because if you are running a multi-GPU system you need one CPU core per GPU, as CUDA spin-waits for tasks to finish on the GPU.

If you need the best CPU->GPU memory copy performance, be careful with Mobo chipset you get. You can find benchmarks scattered around the forums for some, but the PCIe v2 Intel chipsets have consistently outperformed the PCIe v2 NVIDIA chipsets (to the tune of 6 GiB/s vs 4 GiB/s). If you don’t care too much about the memory copy performance, then any Mobo with a PCIe 16x slot will do.

Thanks for the response. Do you know what RAM was used to get 6 GB/s? I wonder if using faster DDR3-memory improves PCIe performance…

Tom’s hardware RAM speed test

I know some of the benchmarks reaching that level were with DDR2. I’m not sure if there were any with DDR3. PCIe gen2 x16 only provides 8 GiB/s theoretical. Mark Harris says (http://forums.nvidia.com/index.php?showtopic=31108&view=findpost&p=193201) that there is 15% overhead for PCI-e transfers => The max you can expect is 6.8 GiB/s, assuming that the chipset is capable of delivering that in the first place (something it appears 780i’s NF200 bridge limits).

So, DDR2 800 and DDR3 800 (max 6.4 GB/s) is not optimal. But anything better should be enough. I don’t know if transfer from RAM to GPU is dependent on thing such as FSB-speed. “Stumbled” upon this:


Slowly starting to see what kind of hardware I need…

I get 4.6GB/s HtoD and DtoH on an X38 motherboard with DDR2 and an 8800 GT. At work, I get 6GB/s on whatever Xeon motherboard is in my machine (with FBDIMMs and all that) to a C1060.

I just built a GTX 280 workstation today, and got 6.3 GB/sec DtoH, 5.3 GB/sec HtoD pinned memory bandwidth. This is with a 2.6 GHz quad-core Phenom on the MSI K9A2 Platinum (AMD 790FX chipset) and some cheap DDR2-800 memory. So it looks like this motherboard can saturate the DDR2 memory bandwidth, at least in one direction.

(Only warning about the MSI board. For some reason it defaulted to disabling PCI-E 2.0, rather than auto-detecting the card capability. Just had to flip that option in the BIOS and then I got full speed.)