Low Aggregate PCI Bandwidth for 9800GX2

curryml · September 9, 2008, 2:48pm

Hello all,

I have a 9800GX2 that I’m trying to exercise with full PCIe v2 bandwidth. I have verified that the motherboard (MSI P7N Diamond) supports PCIe v2, and the card itself is PCIe v2. Running the bandwidth test with device=all yields 6+GB/s, but this test is performed serially instead of in parallel. When I run my own test with both cards performing at the same time, I get 3GB/s, which is the same as if transferring to one core, which is also the same as a PCIe v1 card would get.

Is this a problem with my motherboard? My tests? My assumptions? I am assuming that both GPUs should be able to upload at the same time and get 6GB/s, same with download.

Matthew

tmurray · September 9, 2008, 3:11pm

Well, yeah, that assumption is wrong. A 9800 GX2 is two G92 chips connected to each other in a single slot. PCIe 2.0 gets you about 6GB/s per slot. Ergo, when streaming data to both GPUs simultaneously, you get 3GB/s per GPU.

curryml · September 9, 2008, 3:53pm

Well, my assumption was 6 GB/s aggregate, not per. As it is, I get 3 GB/s aggregate across both GPUs simulatenously. That is, when both are going at the same time, I get 1.5GB/s per GPU.

tmurray · September 9, 2008, 4:44pm

Okay, let me make sure I understand what you’re saying…

Running bandwidthTest on each chip independently = 6 GB/s per chip.

Running your own bandwidth test on both chips simultaneously = 1.5 GB/s per chip?

curryml · September 9, 2008, 5:26pm

Haha, sorry, I’m suffering from imprecise language. :) The general truism is: I’m not getting over 3 GB/s over the bus at a given moment.

Running bandwidthTest on one chip: 3 GB/s

Running bandwidthTest on two chips (which does this serially for each chip): 6 GB/s (3 GB/s per chip, not simultaneously)

Running a test where transfers go to both chips simultaneously: 3 GB/s (1.5 GB/s per chip simultaneously)

tmurray · September 9, 2008, 5:38pm

Double check that PCIe 2.0 is enabled in the BIOS, then. I’ve seen some boards where for whatever reason it’s not enabled by default.

curryml · September 9, 2008, 7:58pm

I’ve heard this from a few people, but unfortunately I cannot find an option pertaining to PCIe2 in the BIOS. I’ve spidered it several times.

Would someone be willing to run the benchmark I’ve concocted on their GX2? I have attached source, renamed from .cu to .txt extension. Just compile with lpthread.
bandwidthTest.txt (1.56 KB)

MisterAnderson42 · September 10, 2008, 5:32pm

Here is the result on my 9800 GX2 system (780i MB):

Testing 0 and 1

3953.766717 MB/s

This is right in line with the single GPU performance as expected:

Transfer Size (Bytes)   Bandwidth(MB/s)

 33554432               3952.1

Quick Mode

Device to Host Bandwidth for Pinned memory

.

Transfer Size (Bytes)   Bandwidth(MB/s)

 33554432               3874.4

Just for fun, I also modified your bandwidth test file to run on GPUs 1 and 2, which are the 2nd GPU of the 9800 GX2 and an 8800 GTX (G92) in the 2nd PCI-e x16 slot. Thus, this test removes the effect of the shared PCI-e slot for the 9800 GX2 and benchmarks how fast 2 GPUs can be fed with the full resources available.

Not surprisingly, the results are:

Testing 1 and 2

4010.857409 MB/s

Why am I not surprised that this isn’t faster? 1) The 780i MB runs one PCIe link to the NF200 chip which drives both x16 PCIe v2 slots and thus it cannot possibly feed both cards at full bandwidth. 2) System memory speed starts to become the limiting factor, too, at these speeds.

curryml · September 10, 2008, 7:34pm

I’ve seen bandwidth tests performed which show high bandwidth to a single-chip PCIe2 card, like this: [url=“http://forums.nvidia.com/index.php?act=Print&client=printer&f=64&t=71468”]http://forums.nvidia.com/index.php?act=Pri...er&f=64&t=71468[/url]
On average, 5.2GB/s to a single 280GTX.

Also, here is a test done on a 9800 GTX:
[url=“http://forums.nvidia.com/lofiversion/index.php?t70042.html”]http://forums.nvidia.com/lofiversion/index.php?t70042.html[/url]

It seems like only eight lanes of PCIe are usable by each chip, but only one or the other can transfer at a time… Meaning that full PCIe2 speeds aren’t achievable.

I found another machine to plug this card into, and the results are much the same.

More thoughts?

MisterAnderson42 · September 10, 2008, 7:51pm

edit: double post due to forum software error

MisterAnderson42 · September 10, 2008, 7:58pm

That doesn’t make any sense. The benchmarks I ran got near 4 GiB/s to a single GPU. This is the theoretical peak of an x8 PCIe v2 connection. There is a packet overheat of ~15% for each transfer so I could not possibly get this performance on an x8 connection to a single GPU.

http://www.bit-tech.net/hardware/2008/03/1…graphics_card/2 details that the switch connecting the two GPUs in the 9800 GTX delivers x16 bandwidth to each GPU.

As I outlined in my post above, the chipset I’m running is to blame for the ~4 GiB/s throughput. Every report on these forms I’ve seen for 5-6 GiB/s was running on an Intel PCIe v2 chipset.

On the same MB, I bet it will also sustain ~6 GiB/s to a single GPU, or 6 GiB/s total spread evenly to the 2 GPUs. I just don’t have such a system to test on.

curryml · September 16, 2008, 12:59am

I figured an update is in order, as I have resolved that it was the chipset. I bought an Intel board and the problems went away.

The bandwidth test I posted reports 5384 MB/s.

NVIDIA’s bandwidth test reports:

Quick Mode
Host to Device Bandwidth for Pinned memory
…
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 9486.3

Quick Mode
Device to Host Bandwidth for Pinned memory
…
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 8840.0

MisterAnderson42 · September 16, 2008, 1:49am

Those are the fastest bandwidth numbers I’ve seen on the forum! What board is it? I’m assuming that it runs DDR3 memory, too, given the perf you are getting.

curryml · September 16, 2008, 2:55am

Well, it is running DDR3, but this exposes my gripes about the NVIDIA bandwidth test. This is the sum of the bandwidths to each chip serially, not in parallel. The parallel test gives a little more than half those figures. Running the test on one chip gives almost exactly half those figures.

MisterAnderson42 · September 16, 2008, 12:39pm

Ah, that explains it.

And it gives another example of a DDR3 board not performing above ~6 GiB/s. Intel’s PCIe v2 implementation must have some bottleneck in there somewhere too, just higher than NF200’s.

Topic		Replies	Views
Host to Device Memroy Bandwidth CUDA Programming and Performance	18	8045	September 12, 2008
very low PCIe bandwidth CUDA Programming and Performance	9	3505	March 2, 2010
Memory bandwidth CUDA Programming and Performance	31	38553	October 5, 2007
Is this PCIe 2.0 bandwidth low? 3.1 GB/s pinned CUDA Programming and Performance	45	20223	December 28, 2008
GTX295 question CUDA Programming and Performance	11	10183	May 10, 2009
Bandwidht Usage CUDA Programming and Performance	16	8958	October 30, 2008
bandwidth question CUDA Programming and Performance	2	2130	August 15, 2008
Where has all the bandwidth gone? Bandwidth loss with concurrent sends on "independent" PCIe CUDA Programming and Performance	3	1127	February 29, 2012
Host to Device Bandwidth and PCEe 2.0 - not getting what I should! CUDA Programming and Performance	4	2738	February 18, 2009
Maximum bandwidth with Intel Z68 Chip CUDA Programming and Performance	8	7676	August 16, 2011

Low Aggregate PCI Bandwidth for 9800GX2

Related topics