The fastest platform of GPU computing

darot · January 11, 2010, 6:05am

I have to do image process for a huge data(8k*200k).
Data transfer is the bottleneck of my project.
I use intel X58 chipset and Allocate my buffer as pinned memory, and it makes about 5GB/s transfer speed(both up and down)
I have 2 GTX 285 GPUs.

Is it already the best platform for the bandwith issue?
Or there are alreay ohter better chipsets which make transfer speed faster?

My requirement is more and more pci-e lanes so that I can plug more GTX 285 GPUs to reduce number of pc.
And bigger bandwith of PCI-E to make it trasfer data fster.

I also need one or twe pci-e 4x up to plug my frame grabbers.

Thanks.

PS. I visit nvidia’s website and see some chipsets those also provide 2 up PCI-E 2.0 16X(one for Intel Core 2 and one for AM3)
Are they faster than X58?

avidday · January 11, 2010, 7:00am

Intel X58 or AMD 790XT are, in my experience, the fastest chipsets for PCI-e bandwidth in CUDA. Both give sustained 5Gb/s with pinned memory transfers, as you have discovered. There isn’t anything which is faster than what you already have, I am afraid.

darot · January 11, 2010, 7:17am

Thanks for your reply.

Is AMD 790XT means AMD 790X chipset(I can not find AMD 790 XT chipset on AMD’s web site)

And how about comparison beteem AMD 790X and AMD 790FX?

Also, will they better than intel X58 when using CUDA?

How about Intel P55?

Thank you so much.

avidday · January 11, 2010, 7:24am

Sorry, perhaps I wasn’t clear enough. You already have the fastest platform single cpu socket, dual GPU platform there is.

darot · January 11, 2010, 7:27am

Thank you so much.

So even AMD 790 serious chip can not be faster than x58 for the requirement of copying from host to device speed. right?

What is the root cause of transfer speed? Core i7 or X58 Chipset?

If I change from X58 to P55 and still use Core i7? will it be slower to do data transfer between host and device?

avidday · January 11, 2010, 7:40am

For the third (and last time), you have the fastest single CPU platform there is.

5Gb/s sustained is basically as fast as the PCI-e v2 standard can achieve in practice on a 16 lane link, once signaling overheads (almost 20% if I recall correctly) and latency are factored in. The Socket 1156 arrangement has slightly lower latency for a single 16x link, because the PCI-e controller is on the CPU silicon and bypasses the CPU external bus. But it is slower than X58 if you want more than one GPU.

nitin.life · January 12, 2010, 2:19am

:)

5gb/s is the max you can get for now … but if you want more… then wait for this…

http://www.pcisig.com/news_room/08_08_07/

Looks nice…

seibert · January 12, 2010, 2:09pm

Sadly, you’ll have to wait a little bit longer:

http://www.tomshardware.com/news/PCI-Expre…Delay,8515.html

Official PCI-E 3.0 specs pushed back to Q2 2010, and actual products pushed to 2011.

Mr_Nuke · January 12, 2010, 3:12pm

@darot

To complete what everyone else said you do (almost) have the fastest platform there is. You should be looking more at reducing the number of host<->GPU transfers. That solution will have an immediate impact on your computer, my computer, and anyone’s computer which uses a CUDA-enabled GPU.

If you really really need more bandwidth, there is a possibly better solution (note the “possibly”). You can split 16 PCI-E 2.0 lanes into 32 PCI-E 2.0 lanes with the nForce 200 chipset. Note, however, that this will only give you faster overall performance if you are transferring data to/from GPU1 at a different time than to/from GPU2. You will still get 5GB/s transfer rates for each GPU. If, however, you transfer data to both GPUs on the nForce200 chip at the same time, then you will experience far lower transfer rates.

This solution is both cumbersome and expensive. Still, it is useful if you need to connect four behemoths in the same system. The only motherboard that I know of that allows 4 GPUs to be connected at full 16x PCI-E 2.0 is the ASUS P6T7 WS SuperComputer (there may be others that I’m unaware of). It use two nForce 200 chips. Still, you will not see more than 5GB/s transfer rate per GPU.

Also make sure you are using triple-channel DDR3-1600 memory with your i7 (though keep your DRAM voltage below 1.55V). In my experience, memory speed on X58 has a direct effect on PCI-E transfer perfomance.

Best wishes,
Alex

jack · January 12, 2010, 3:26pm

@darot

To complete what everyone else said you do (almost) have the fastest platform there is. You should be looking more at reducing the number of host<->GPU transfers. That solution will have an immediate impact on your computer, my computer, and anyone’s computer which uses a CUDA-enabled GPU.

If you really really need more bandwidth, there is a possibly better solution (note the “possibly”). You can split 16 PCI-E 2.0 lanes into 32 PCI-E 2.0 lanes with the nForce 200 chipset. Note, however, that this will only give you faster overall performance if you are transferring data to/from GPU1 at a different time than to/from GPU2. You will still get 5GB/s transfer rates for each GPU. If, however, you transfer data to both GPUs on the nForce200 chip at the same time, then you will experience far lower transfer rates.

This solution is both cumbersome and expensive. Still, it is useful if you need to connect four behemoths in the same system. The only motherboard that I know of that allows 4 GPUs to be connected at full 16x PCI-E 2.0 is the ASUS P6T7 WS SuperComputer (there may be others that I’m unaware of). It use two nForce 200 chips. Still, you will not see more than 5GB/s transfer rate per GPU.

Also make sure you are using triple-channel DDR3-1600 memory with your i7 (though keep your DRAM voltage below 1.55V). In my experience, memory speed on X58 has a direct effect on PCI-E transfer perfomance.

Best wishes,

Alex

There’s also the TYAN S7025AGM2NR board, which has 4 PCIe 2.0 x16 slots (and supports dual Core-i7 Xeons), but I think it’s pretty new (since I haven’t heard anyone mention it yet).

seibert · January 12, 2010, 4:38pm

Given the problem reports with other dual X58 NUMA systems, I hope someone gets their hands on this and does some CUDA testing with it.

jack · January 12, 2010, 4:49pm

Especially given that it’s advertised as “Certified with NVIDIA Tesla C1060 & S1070 computing system”: http://www.newegg.com/Product/Product.aspx…N82E16813151208

E.D_Riedijk · January 12, 2010, 5:48pm

As we are looking to this board: what kind of problems have been experienced?

Talonman · January 12, 2010, 6:47pm

I am waiting on this dual CPU’ed, over-clockable beauty.

[url=“http://www.evga.com/FORUMS/tm.aspx?&m=107186&mpage=1”]http://www.evga.com/FORUMS/tm.aspx?&m=107186&mpage=1[/url]

It will most likely be my next build.

Mr_Nuke · January 12, 2010, 7:01pm

I see the TYAN board uses dual 5520 IOH’s, wich both have 36 lanes of PCI-E 2.0 connectivity, so that’s way better than the ASUS board I mentioned. The price seems surprisingly acceptable as well.

Does anyone know what PCI-E configuration the EVGA board Talonman mentioned has?

Talonman · January 12, 2010, 7:09pm

One more thread with an XS link in it to check out too:
[url=“http://www.evga.com/FORUMS/tm.aspx?m=117952”]http://www.evga.com/FORUMS/tm.aspx?m=117952[/url]

seibert · January 12, 2010, 7:25pm

Regarding other dual-socket boards, I seem to recall threads in the forum where people were not getting the Host-to-Device/Device-to-Host bandwidth they were expecting, even when setting CPU affinity on the process. You’ll have to google around.

tmurray · January 12, 2010, 7:29pm

Dual-socket single X58 boards are fine, dual-socket dual-X58 boards are not.

SPWorley · January 12, 2010, 7:40pm

Here’s the thread discussing the performance impact of the dual chipset.

Talonman · January 12, 2010, 8:59pm

Note that the new EVGA board uses dual nf200 chips. Do you think that will also be fine getting expected Host-to-Device/Device-to-Host bandwidth?

A Youtube video just to help get a better grip on the size of this beast… ;)

http://www.youtube.com/watch?v=-16R508YLmg

Topic		Replies	Views
4U rackmount case that will fit 4 GTX295 cards Looking for a reasonable rackmount that'll do jut CUDA Programming and Performance	20	9858	March 19, 2010
x58 Chipset PCIE Bandwidth Any improvement? CUDA Programming and Performance	47	21965	December 14, 2008
Fermi? Sounds interesting... CUDA Programming and Performance	58	15508	October 18, 2009
Server Motherboards for mulit-GPU systems (&Fermi) CUDA Programming and Performance	26	21074	November 12, 2009
Dual CPU AM3 motherboard for 4 Tesla C1060s? CUDA Programming and Performance	33	11579	April 23, 2010
how to achieve equal bandwith to 3 GPUs in CUDA? (searching for recent motherboards) CUDA Programming and Performance	13	6443	January 11, 2009
Shopping-list for Cuda GPGPU System in 800-1000 euro price-range Goal: A 'budget' GTX 470 (F CUDA Programming and Performance	59	11995	April 15, 2010
Hardware Recommendations Recs for hardware for GTX 275 or 285 on Linux CUDA Programming and Performance	20	24169	January 13, 2010
Advice on first CUDA system CUDA Programming and Performance	13	2686	July 7, 2009
Any future problems running GPUs for 12+ hours at a time? running cards for long periods of time whi CUDA Programming and Performance	32	5787	September 24, 2010

The fastest platform of GPU computing

Related topics