Host to Device Memroy Bandwidth

Hello all,

I have two cards:
9800 GX2 and 8800 GTX, I am a bit surprised with the host to device memory bandwidth results. Below is the results of a bandwidth test using the one in the SDK.

The 9800 is installed on a PCI-E 2.0 x16 slot while the 8800 is installed on a PCI-E x16 slot.

As far as I know, PCI-E 2.0 can give up to double bandwidth of the PCI-E 1.0. Is there any reason why I am only getting 10% speedup in bandwidth?

====================
Using device 0: GeForce 9800 GX2
Quick Mode
Host to Device Bandwidth for Pinned memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 2820.4

Quick Mode
Device to Host Bandwidth for Pinned memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 2440.3

Quick Mode
Device to Device Bandwidth
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 50090.0

Using device 0: GeForce 8800 GTX
Quick Mode
Host to Device Bandwidth for Pinned memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 2557.1

Quick Mode
Device to Host Bandwidth for Pinned memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 2168.6

Quick Mode
Device to Device Bandwidth
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 62202.4

=======================

Thanks!!!

Probably an issue with your Motherboard BIOS…
I had the same thing with my GTX280 and a BIOS ugrade fixed that issue.

Which motherboard do you have?

My Motherboard is “Asus Striker II - Series Formula -SLI780i”

How much bandwidth am I supposed to expect? I mean, should I expect double the bandwidth with PCI-e 2.0?

Thanks.

Not quite. The speed of the memory and the speed of the MB interconnects may limit you (i.e. the 780i chipset has PCIe-2.0 but it is connected to the northbridge through a slower link).

780i MBs get around 4 GiB/s with PCIe-2.0 cards.

There is a benchmark somewhere on the forums (sorry, I don’t remember where) where some HP and sun workstations were tested. Those with Intel chipsets got near ~6 GiB/s IIRC.

Thanks all!!

A BIOS update actually enhanced the transfer rate, thank you very much shsanjp.

Here is the new result

============================

Using device 0: GeForce 9800 GX2
Quick Mode
Host to Device Bandwidth for Pinned memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 3193.8

Quick Mode
Device to Host Bandwidth for Pinned memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 3195.0

Quick Mode
Device to Device Bandwidth
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 50058.7

&&&& Test PASSED

Hi there,

I think your numbers are still too low.
A colleague of mine has a 9800GTX and he is around 5GB/s.

My guess is that you will have to dig a bit more for beta BIOS updates (there should be one around addressing that issue)
My motherboard was also an Asus and I am pretty sure that it was the same issue.

Check that thread to see if you can find the BIOS for your motherboard

http://www.xtremesystems.org/forums/showth…t=186501&page=4

Good luck.
Warning it is still a beta.

Thanks shsanjp,

The BIOS I installed was released three days ago only, so it is the latest.

The forum you pointed me to discusses different ASUS model which is PCE3 while mine is Formula Striker II.

I guess I will contact the guys in ASUS or wait for another BIOS update.

A number of factors come into play when it comes to bottlenecks. Speed of system memory, motherboard chipset (780i in your case, of which I’m vaguely familiar with). I’ve gotten roughly 4GB/s pinned transfer to a single 16 lane PCIe 2.0 card on a comparable 780i board. In that case the bottleneck was the bus between the 780i and the N200 chip (which served as a PCIe host endblock). Tried upgrading the memory, etc. Finally swapped out with an Intel P45 based board (Asus P5Q Pro) and I get ~6.1GB/s.

I had also non-explainable bottleneck with ASUS motherboard with 8800GT (Intel Chipset),
nothing really helped including latest bios or so, I believe the issue is indeed in more deeper hardware things, rather than marketing front-end declared in specs.

So far my bet is that if you want to get the most of CUDA you’ve got to use NVIDIA motherboard ;-)

The other variant is look to the configurations that OTHERs have already proved to have maximum theoretical bandwidth reached and buy exactly that.

Just to throw another element into the mix, I also see a consistent difference on bandwidthTest results between Linux and Windows on the same hardware.

WinXP-64, cuda 2.0 beta (177.35) :

Linux-64 (Ubuntu 8.04), cuda 2.0 beta :

On this hardware, Linux is ~ 270MB/s faster than Windows for h-to-d and d-to-h, unless some OS-dependent issue with the event timing is coming into play.

Either way, this is on a 790i (NVIDIA-based) motherboard, so my 5GB/s bandwidth looks less than the 6GB/s on pstach’s P45 (Intel-based) motherboard, if bandwidthTest.exe is a comparable and trustworthy measure.

Thanks all,

I am actually running Linux-64 Fedora 8.

What puzzles me about Asus P5Q Pro is that its PCIe 2.0 slots can support up to 8x, which means up to 4GB/Sec theoritical bandwidth, I am not sure how pstach got the 6GB/Sec!!!

I checked the Asus P5Q Pro specs here

http://www.asus.com/products.aspx?modelmen…=11&l3=709&l4=0

in crossfire mode it can do 2x 8X lanes… single modus its the full 16x

I have one 790i mother board, the ASUS Striker II Extreme, I use SUSE 10.3 and have updated the latest driver from NVIDA, however the bandwidth test result on my GTX280 is extremely low on my machine. Only 1.7 G even with pinned memory, only a little bit faster than 1.5 with pagable memory.

I don’t know where to start fixing the problem. Any idea is appreciated.

Thank you

It sounds like you’re not get Gen-2 speeds. Here are some ideas extracted from what I’ve read in these forums:

First, make sure that your card is in a PCI 2.0 slot (motherboard manual will tell you which slots are the 2.0 x16 slots).

Someone here noted that you may have to modify a motherboard BIOS setting in order to enable PCI 2.0 speeds.

Some people have reported bandwidth improvements by upgrading to the latest BIOS for their motherboard. [Not usually as big a jump as you’re looking for.]

Finally, I remember seeing in these forums a suggestion that you won’t get full speed if you’re missing one of the two power connections. I don’t have a GTX2*, so I’m not sure if this applies to you, but maybe double-check your power connections.

Hope that helps… Maybe someone with the same motherboard or GPU will have a better idea.

The latest driver from NVidia caused a bad slowdown in CPU<->device bandwidth for some people:

http://forums.nvidia.com/index.php?showtopic=75466

plus, NVidia aknowledges that device-to-device bandwidth is a bit worse (5-10%) on CUDA 2.0 vs. 1.1 (same thread).

Let’s hope they fix it ASAP.

Fernando

Thanks, it is really helpful. I get 5.7GB with GTX280, however the device-to-device bandwidth is still pretty low 112 vs 147 (in NVIDA spec)

Real bandwidth will always be slightly lower than theoretical peak bandwidth.

Actually you can archive better bandwidth result by increase the PCIe clock speed, however i do not recommend you doing this since it may have some really bad side effect if you don’t know how to it (it may corrupt your system). You can try with slightly increase the PCI clock base (100) , from 2 to maximum 5 each step and do not change over 15%, and see if the performance increase, sometimes it works sometimes it does not