I have to do image process for a huge data(8k*200k).
Data transfer is the bottleneck of my project.
I use intel X58 chipset and Allocate my buffer as pinned memory, and it makes about 5GB/s transfer speed(both up and down)
I have 2 GTX 285 GPUs.
Is it already the best platform for the bandwith issue?
Or there are alreay ohter better chipsets which make transfer speed faster?
My requirement is more and more pci-e lanes so that I can plug more GTX 285 GPUs to reduce number of pc.
And bigger bandwith of PCI-E to make it trasfer data fster.
I also need one or twe pci-e 4x up to plug my frame grabbers.
Thanks.
PS. I visit nvidia’s website and see some chipsets those also provide 2 up PCI-E 2.0 16X(one for Intel Core 2 and one for AM3)
Are they faster than X58?
Intel X58 or AMD 790XT are, in my experience, the fastest chipsets for PCI-e bandwidth in CUDA. Both give sustained 5Gb/s with pinned memory transfers, as you have discovered. There isn’t anything which is faster than what you already have, I am afraid.
For the third (and last time), you have the fastest single CPU platform there is.
5Gb/s sustained is basically as fast as the PCI-e v2 standard can achieve in practice on a 16 lane link, once signaling overheads (almost 20% if I recall correctly) and latency are factored in. The Socket 1156 arrangement has slightly lower latency for a single 16x link, because the PCI-e controller is on the CPU silicon and bypasses the CPU external bus. But it is slower than X58 if you want more than one GPU.
To complete what everyone else said you do (almost) have the fastest platform there is. You should be looking more at reducing the number of host<->GPU transfers. That solution will have an immediate impact on your computer, my computer, and anyone’s computer which uses a CUDA-enabled GPU.
If you really really need more bandwidth, there is a possibly better solution (note the “possibly”). You can split 16 PCI-E 2.0 lanes into 32 PCI-E 2.0 lanes with the nForce 200 chipset. Note, however, that this will only give you faster overall performance if you are transferring data to/from GPU1 at a different time than to/from GPU2. You will still get 5GB/s transfer rates for each GPU. If, however, you transfer data to both GPUs on the nForce200 chip at the same time, then you will experience far lower transfer rates.
This solution is both cumbersome and expensive. Still, it is useful if you need to connect four behemoths in the same system. The only motherboard that I know of that allows 4 GPUs to be connected at full 16x PCI-E 2.0 is the ASUS P6T7 WS SuperComputer (there may be others that I’m unaware of). It use two nForce 200 chips. Still, you will not see more than 5GB/s transfer rate per GPU.
Also make sure you are using triple-channel DDR3-1600 memory with your i7 (though keep your DRAM voltage below 1.55V). In my experience, memory speed on X58 has a direct effect on PCI-E transfer perfomance.
There’s also the TYAN S7025AGM2NR board, which has 4 PCIe 2.0 x16 slots (and supports dual Core-i7 Xeons), but I think it’s pretty new (since I haven’t heard anyone mention it yet).
I see the TYAN board uses dual 5520 IOH’s, wich both have 36 lanes of PCI-E 2.0 connectivity, so that’s way better than the ASUS board I mentioned. The price seems surprisingly acceptable as well.
Does anyone know what PCI-E configuration the EVGA board Talonman mentioned has?
Regarding other dual-socket boards, I seem to recall threads in the forum where people were not getting the Host-to-Device/Device-to-Host bandwidth they were expecting, even when setting CPU affinity on the process. You’ll have to google around.