I have a 9800GX2 that I’m trying to exercise with full PCIe v2 bandwidth. I have verified that the motherboard (MSI P7N Diamond) supports PCIe v2, and the card itself is PCIe v2. Running the bandwidth test with device=all yields 6+GB/s, but this test is performed serially instead of in parallel. When I run my own test with both cards performing at the same time, I get 3GB/s, which is the same as if transferring to one core, which is also the same as a PCIe v1 card would get.
Is this a problem with my motherboard? My tests? My assumptions? I am assuming that both GPUs should be able to upload at the same time and get 6GB/s, same with download.
Well, yeah, that assumption is wrong. A 9800 GX2 is two G92 chips connected to each other in a single slot. PCIe 2.0 gets you about 6GB/s per slot. Ergo, when streaming data to both GPUs simultaneously, you get 3GB/s per GPU.
Well, my assumption was 6 GB/s aggregate, not per. As it is, I get 3 GB/s aggregate across both GPUs simulatenously. That is, when both are going at the same time, I get 1.5GB/s per GPU.
I’ve heard this from a few people, but unfortunately I cannot find an option pertaining to PCIe2 in the BIOS. I’ve spidered it several times.
Would someone be willing to run the benchmark I’ve concocted on their GX2? I have attached source, renamed from .cu to .txt extension. Just compile with lpthread. bandwidthTest.txt (1.56 KB)
Here is the result on my 9800 GX2 system (780i MB):
Testing 0 and 1
3953.766717 MB/s
This is right in line with the single GPU performance as expected:
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 3952.1
Quick Mode
Device to Host Bandwidth for Pinned memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 3874.4
Just for fun, I also modified your bandwidth test file to run on GPUs 1 and 2, which are the 2nd GPU of the 9800 GX2 and an 8800 GTX (G92) in the 2nd PCI-e x16 slot. Thus, this test removes the effect of the shared PCI-e slot for the 9800 GX2 and benchmarks how fast 2 GPUs can be fed with the full resources available.
Not surprisingly, the results are:
Testing 1 and 2
4010.857409 MB/s
Why am I not surprised that this isn’t faster? 1) The 780i MB runs one PCIe link to the NF200 chip which drives both x16 PCIe v2 slots and thus it cannot possibly feed both cards at full bandwidth. 2) System memory speed starts to become the limiting factor, too, at these speeds.
It seems like only eight lanes of PCIe are usable by each chip, but only one or the other can transfer at a time… Meaning that full PCIe2 speeds aren’t achievable.
I found another machine to plug this card into, and the results are much the same.
That doesn’t make any sense. The benchmarks I ran got near 4 GiB/s to a single GPU. This is the theoretical peak of an x8 PCIe v2 connection. There is a packet overheat of ~15% for each transfer so I could not possibly get this performance on an x8 connection to a single GPU.
As I outlined in my post above, the chipset I’m running is to blame for the ~4 GiB/s throughput. Every report on these forms I’ve seen for 5-6 GiB/s was running on an Intel PCIe v2 chipset.
On the same MB, I bet it will also sustain ~6 GiB/s to a single GPU, or 6 GiB/s total spread evenly to the 2 GPUs. I just don’t have such a system to test on.
Those are the fastest bandwidth numbers I’ve seen on the forum! What board is it? I’m assuming that it runs DDR3 memory, too, given the perf you are getting.
Well, it is running DDR3, but this exposes my gripes about the NVIDIA bandwidth test. This is the sum of the bandwidths to each chip serially, not in parallel. The parallel test gives a little more than half those figures. Running the test on one chip gives almost exactly half those figures.
And it gives another example of a DDR3 board not performing above ~6 GiB/s. Intel’s PCIe v2 implementation must have some bottleneck in there somewhere too, just higher than NF200’s.