bandwithTest.exe on Tesla c2050 possible slow speed

Guys,
I am running the bandwitdhTest.exe code from SDK on Tesla c2050 on Windows Server 2008 R2 machine. I have PCI Express x16 gen2.
I see Host-Device bandwidth of 3GB/s which becomes 6GB/s with pinned memory.

I was expecting higher bandwith, what could be the problem?
Thanks

L

No problem I can see. Those numbers are at the very upper end of what to expect from every PCI-e 2.0 platform I am familiar with.

Here’s mine, running on 64-bit Ubuntu 10.04, PCIX 2.0 16X:

./bandwidthTest

Running on...

Device 0: Tesla C2050

 Quick Mode

Host to Device Bandwidth, 1 Device(s), Paged memory

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			5287.9

Device to Host Bandwidth, 1 Device(s), Paged memory

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			4344.0

Device to Device Bandwidth, 1 Device(s)

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			91848.9

./bandwidthTest --memory=pined

Running on...

Device 0: Tesla C2050

 Quick Mode

Host to Device Bandwidth, 1 Device(s), Pinned memory

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			5831.2

Device to Host Bandwidth, 1 Device(s), Pinned memory

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			6167.7

Device to Device Bandwidth, 1 Device(s)

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			91850.6

The non-pinned variant requires a copy in system memory on the host side (user memory <-> pinned DMA buffer), so the performance depends heavily on the system memory performance of the host. The latest x86 platforms have very good system memory throughput; on these the difference between the pinned and non-pinned cases is much smaller than on older systems where the throughput of system memory wasn’t much higher than the throughput of PCIe gen2. From the numbers posted, zeus13i seems to have such a latest generation x86 host system.

Why does the bandwidth test on the C2050 give much lower numbers than the peak performance of 1.5GB/s?

Please take a look at the following forums thread, where I answered this question with respect to a Fermi-based Quadro: