The fastest platform of GPU computing

I have to do image process for a huge data(8k*200k).
Data transfer is the bottleneck of my project.
I use intel X58 chipset and Allocate my buffer as pinned memory, and it makes about 5GB/s transfer speed(both up and down)
I have 2 GTX 285 GPUs.

Is it already the best platform for the bandwith issue?
Or there are alreay ohter better chipsets which make transfer speed faster?

My requirement is more and more pci-e lanes so that I can plug more GTX 285 GPUs to reduce number of pc.
And bigger bandwith of PCI-E to make it trasfer data fster.

I also need one or twe pci-e 4x up to plug my frame grabbers.

Thanks.

PS. I visit nvidia’s website and see some chipsets those also provide 2 up PCI-E 2.0 16X(one for Intel Core 2 and one for AM3)
Are they faster than X58?

Intel X58 or AMD 790XT are, in my experience, the fastest chipsets for PCI-e bandwidth in CUDA. Both give sustained 5Gb/s with pinned memory transfers, as you have discovered. There isn’t anything which is faster than what you already have, I am afraid.

Thanks for your reply.

Is AMD 790XT means AMD 790X chipset(I can not find AMD 790 XT chipset on AMD’s web site)

And how about comparison beteem AMD 790X and AMD 790FX?

Also, will they better than intel X58 when using CUDA?

How about Intel P55?

Thank you so much.

Sorry, perhaps I wasn’t clear enough. You already have the fastest platform single cpu socket, dual GPU platform there is.

Thank you so much.

So even AMD 790 serious chip can not be faster than x58 for the requirement of copying from host to device speed. right?

What is the root cause of transfer speed? Core i7 or X58 Chipset?

If I change from X58 to P55 and still use Core i7? will it be slower to do data transfer between host and device?

For the third (and last time), you have the fastest single CPU platform there is.

5Gb/s sustained is basically as fast as the PCI-e v2 standard can achieve in practice on a 16 lane link, once signaling overheads (almost 20% if I recall correctly) and latency are factored in. The Socket 1156 arrangement has slightly lower latency for a single 16x link, because the PCI-e controller is on the CPU silicon and bypasses the CPU external bus. But it is slower than X58 if you want more than one GPU.

:)

5gb/s is the max you can get for now … but if you want more… then wait for this…

http://www.pcisig.com/news_room/08_08_07/

Looks nice…

Sadly, you’ll have to wait a little bit longer:

http://www.tomshardware.com/news/PCI-Expre…Delay,8515.html

Official PCI-E 3.0 specs pushed back to Q2 2010, and actual products pushed to 2011.

@darot

To complete what everyone else said you do (almost) have the fastest platform there is. You should be looking more at reducing the number of host<->GPU transfers. That solution will have an immediate impact on your computer, my computer, and anyone’s computer which uses a CUDA-enabled GPU.

If you really really need more bandwidth, there is a possibly better solution (note the “possibly”). You can split 16 PCI-E 2.0 lanes into 32 PCI-E 2.0 lanes with the nForce 200 chipset. Note, however, that this will only give you faster overall performance if you are transferring data to/from GPU1 at a different time than to/from GPU2. You will still get 5GB/s transfer rates for each GPU. If, however, you transfer data to both GPUs on the nForce200 chip at the same time, then you will experience far lower transfer rates.

This solution is both cumbersome and expensive. Still, it is useful if you need to connect four behemoths in the same system. The only motherboard that I know of that allows 4 GPUs to be connected at full 16x PCI-E 2.0 is the ASUS P6T7 WS SuperComputer (there may be others that I’m unaware of). It use two nForce 200 chips. Still, you will not see more than 5GB/s transfer rate per GPU.

Also make sure you are using triple-channel DDR3-1600 memory with your i7 (though keep your DRAM voltage below 1.55V). In my experience, memory speed on X58 has a direct effect on PCI-E transfer perfomance.

Best wishes,
Alex

There’s also the TYAN S7025AGM2NR board, which has 4 PCIe 2.0 x16 slots (and supports dual Core-i7 Xeons), but I think it’s pretty new (since I haven’t heard anyone mention it yet).

Given the problem reports with other dual X58 NUMA systems, I hope someone gets their hands on this and does some CUDA testing with it.

Especially given that it’s advertised as “Certified with NVIDIA Tesla C1060 & S1070 computing system”: http://www.newegg.com/Product/Product.aspx…N82E16813151208

As we are looking to this board: what kind of problems have been experienced?

I am waiting on this dual CPU’ed, over-clockable beauty.

[url=“http://www.evga.com/FORUMS/tm.aspx?&m=107186&mpage=1”]http://www.evga.com/FORUMS/tm.aspx?&m=107186&mpage=1[/url]

It will most likely be my next build.

I see the TYAN board uses dual 5520 IOH’s, wich both have 36 lanes of PCI-E 2.0 connectivity, so that’s way better than the ASUS board I mentioned. The price seems surprisingly acceptable as well.

Does anyone know what PCI-E configuration the EVGA board Talonman mentioned has?

One more thread with an XS link in it to check out too:
[url=“http://www.evga.com/FORUMS/tm.aspx?m=117952”]http://www.evga.com/FORUMS/tm.aspx?m=117952[/url]

Regarding other dual-socket boards, I seem to recall threads in the forum where people were not getting the Host-to-Device/Device-to-Host bandwidth they were expecting, even when setting CPU affinity on the process. You’ll have to google around.

Dual-socket single X58 boards are fine, dual-socket dual-X58 boards are not.

Here’s the thread discussing the performance impact of the dual chipset.

Note that the new EVGA board uses dual nf200 chips. Do you think that will also be fine getting expected Host-to-Device/Device-to-Host bandwidth?

A Youtube video just to help get a better grip on the size of this beast… ;)

http://www.youtube.com/watch?v=-16R508YLmg