bandwith performance on PCI-E v1 slow?

After swapping one of my 8800GTX’s for a T10P, I get quite slow memory bandwith for host-device and device-host copies. And also the device-device copies are not as much faster as I expected. Are these expected pre-production effects?

Device 0: “GeForce 8800 GTX”
Major revision number: 1
Minor revision number: 0
Total amount of global memory: 804585472 bytes
Number of multiprocessors: 16
Number of cores: 128
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 1.35 GHz
Concurrent copy and execution: No

Device 1: “GT200”
Major revision number: 1
Minor revision number: 3
Total amount of global memory: 4294705152 bytes
Number of multiprocessors: 30
Number of cores: 240
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 16384
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 1.51 GHz
Concurrent copy and execution: Yes

bandwidthTest --device=0
Using device 0: GeForce 8800 GTX
Quick Mode
Host to Device Bandwidth for Pageable memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 2000.4

Quick Mode
Device to Host Bandwidth for Pageable memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1842.8

Quick Mode
Device to Device Bandwidth
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 71468.5

bandwidthTest --memory=pinned --device=0
Using device 0: GeForce 8800 GTX
Quick Mode
Host to Device Bandwidth for Pinned memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 3151.8

Quick Mode
Device to Host Bandwidth for Pinned memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 2956.5

Quick Mode
Device to Device Bandwidth
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 71468.5

bandwidthTest --memory=pinned --device=1
Using device 1: GT200
Quick Mode
Host to Device Bandwidth for Pinned memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1718.5

Quick Mode
Device to Host Bandwidth for Pinned memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1663.2

Quick Mode
Device to Device Bandwidth
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 74748.9

bandwidthTest --device=1
Using device 1: GT200
Quick Mode
Host to Device Bandwidth for Pageable memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1666.5

Quick Mode
Device to Host Bandwidth for Pageable memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1502.1

Quick Mode
Device to Device Bandwidth
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 74818.8

One thing is you are comparing 8800 GTX in slot 0 to T10P in slot 1. It’s possible your slot 1 could have fewer physical lanes. Can you try swapping the boards?

Mark

Before I had the same values for both 8800GTX’s, so I am afraid that is probably not the reason. The system is a Dell XPS 720H2C, it comes standard with 2 8800GTX’s.

If you want me to try anyway, I can probably do that monday (very maybe tomorrow, if I am really lucky and my running calculation is finished tomorrow before 17:00) Let me know if it is still smart to try.

Another reason might be that I only have 2 6-pin connectors available, and the card I received has the 8-pin socket at another edge as the 6-pin socket. The paper I received told about putting the 6-pin connector in the 8-pin socket next to the other 6-pin connector, for me I tried putting it on the top, maybe it should be at the bottom?

And about the device-device copies? Device bandwith will be higher in the final product? (That would be our major performance gain probably, and as I understood from Dave it should be getting a nice boost)

Hello everyone!
Here are some benchmark numbers from my T10P sample for your reference.
The host computer is a Dell Precision T3400.

Device 0: “GT200”
Major revision number: 1
Minor revision number: 3
Total amount of global memory: 1073479680 bytes
Number of multiprocessors: 24
Number of cores: 192
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 16384
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 1.08 GHz
Concurrent copy and execution: Yes

Device 1: “Quadro FX 1700”
Major revision number: 1
Minor revision number: 1
Total amount of global memory: 536543232 bytes
Number of multiprocessors: 4
Number of cores: 32
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 0.92 GHz
Concurrent copy and execution: Yes

bandwidthTest.exe --device=0
Using device 0: GT200
Quick Mode
Host to Device Bandwidth for Pageable memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 2169.0

Quick Mode
Device to Host Bandwidth for Pageable memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 2126.5

Quick Mode
Device to Device Bandwidth
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 85115.4

bandwidthTest.exe --memory=pinned --device=0
Using device 0: GT200
Quick Mode
Host to Device Bandwidth for Pinned memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 5023.6

Quick Mode
Device to Host Bandwidth for Pinned memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 5502.7

Quick Mode
Device to Device Bandwidth
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 85113.5