I need to build a machine specifically for running CUDA apps. The app will actually be doing quite a bit of CPU processing as well and requires a LOT of RAM.
My thinking has been to get this board:
And then populate it with 2-4 quad opterons and eventually, I hope, the full 256GB of RAM (though I’m going to wait a year or so until the price of 8GB ram modules drops significantly).
Currently I’m planning to use two 280s, though this purchase is several months off and I’m wondering if there’s any truth to the rumors of the 350s being released in Q4…
So, here are my questions:
1> Each card is going to peg a core, correct? It only pegs one core, per card, however, is that correct? So if I have 2 quad cores in the machine, I should be able to have 6 available cores while CUDA calls are being made, is this correct?
2> Is there any way to estimate how quickly this board will transfer memory between host and device? If not, can anyone recommend a board capable of holding at least 64MB, has at least 2 PCI-e 16x slots, and is known to perform quick transfers?
3> When using multiple CUDA devices, does anyone have any experience regarding optimizing memory transfers? For example, would it be best to have 2 cores copying data to both cards simultaneously or would it be better to do some sort of interleaving like this::
Thread 1> Data Copy CPU to GPU#1
Thread 1> Launch kernel — Thread 2> Data Copy CPU to GPU#2
Thread 2> Launch kernel — Thread 1> Data Copy GPU #1 to CPU
Thread 1> Data Copy GPU #2 to CPU
Thanks for any help anyone can provide…