I’m looking to configure and buy a new computer to do some data processing and would like to use CUDA on it. So far, I’ve looked a little at the Dell Precision T7500 or the HP Z800. I think I can configure them to match my needs but I’d appreciate any tips that y’all may have on how to configure it. I’m especially wary of any pitfalls because my schedule is fairly tight. I can’t afford to spend too much time mucking around in hardware or configuration issues.
I’m looking at getting some Windows 7 x64 variant. Is there any real difference between Pro or Ultimate? Running Linux may be a possibility, but I’d prefer not to due to schedule reasons.
I’ve got an incoming data stream of ~6.5Gb/s and need to process that into an output stream at ~2.5Gb/s. The processing is not all that intensive and has significant data parallelism. I suspect that the computation may exceed the capabilities of a dual xeon quad core system, so I want to get a single CUDA capable card to offload the work. I expect the computational requirements to grow in the future and would like some extra headroom.
The system needs at least 2 PCI-e x4 slots in addition to anything that may be used up by the graphics and CUDA card and I’d prefer to have a some extra slots for future needs… I’d prefer that these slots be Gen 2. I actually don’t mind if the graphics can be run off a PCI slot to free up an extra PCI-e slot. I saw some motherboards that seem to be designed for CUDA clusters with a bunch of PCI-e x16 slots, but I don’t know how to locate a reputable vendor of complete systems using those.
The main advantage to Linux is the reduced overhead to CUDA kernel calls due to avoiding the WDDM. If you are planning to buy Tesla cards, then NVIDIA provides a special Windows driver that runs the Tesla card outside the graphics subsystem without the WDDM overhead. I don’t know about the Pro vs. Ultimate differences.
Assuming you are using Gb to mean gigabits, that sounds like it should be doable with plenty of headroom if you are careful with buffering. You’ll want to use CUDA Streams to ensure that you overlap data transfer with computation. This is also a place where the Tesla C2050 or C2070 card would be useful. Unlike the regular GeForce cards, the Fermi-based Tesla cards have two separate DMA engines which can perform transfers between GPU and host memory in two directions at once with the GPU performs computations. With a triple-buffering scheme, you should then be able to pipeline data pretty efficiently. Do you have a strong latency requirement?
NVIDIA has a list of vendors that sell workstations preconfigured with Tesla cards:
Some of those vendors sell systems that can support 1 to 4 (maybe more) devices. One of them should be able to build a custom system to meet your requirements.