[BENCHMARKS] Tesla C1060 VS Tesla C2050 Quite odd results

Hi all,

I put my hands on two different CUDA-enabled hardware configurations

[i]

Hardware 1

    CPU: Intel X5560 (two quad-core chipset)

    RAM: 6 x HMT151R7BFR4C-H9, 4GB each (= 24 GB) 1333MHz

    Video: 4 Tesla C1060

Hardware 2

    CPU: Intel X5570 (two quad-core chipset)

    RAM: 6 x HMT125R7BFR8C-H9, 2GB each (= 12 GB) 1333MHz

    Video: 4 Tesla C2050 (Fermi)

[/i]

Running the bandwidthTest test from the SDK examples, I get some strange results (figures may vary a little, but the differences are always remarkable).

Hardware 1

./bandwidthTest Starting...

Running on...

Device 0: Tesla C1060

 Quick Mode

Host to Device Bandwidth, 1 Device(s), Paged memory

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			5033.9

Device to Host Bandwidth, 1 Device(s), Paged memory

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			2946.1

Device to Device Bandwidth, 1 Device(s)

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			73801.6

[bandwidthTest] - Test results:

PASSED

Press <Enter> to Quit...

-----------------------------------------------------------

Hardware 2

[bandwidthTest]

./bandwidthTest Starting...

Running on...

Device 0: Tesla C2050

 Quick Mode

Host to Device Bandwidth, 1 Device(s), Paged memory

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			4405.8

Device to Host Bandwidth, 1 Device(s), Paged memory

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			2978.5

Device to Device Bandwidth, 1 Device(s)

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			86622.1

[bandwidthTest] - Test results:

PASSED

Press <Enter> to Quit...

-----------------------------------------------------------

So, there must be something wrong. How comes that HW1’s host->device performances are better than HW2’s?

Waiting for any suggestions, I thank you all.

A.

Out of curiosity, can you also post the results from a run with --memory=pinned?

Ok, now it makes sense.

Hardware 1

[bandwidthTest]

./bandwidthTest Starting...

Running on...

Device 0: Tesla C1060

 Quick Mode

Host to Device Bandwidth, 1 Device(s), Pinned memory, Write-Combined Memory Enabled

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			5721.4

Device to Host Bandwidth, 1 Device(s), Pinned memory, Write-Combined Memory Enabled

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			3436.4

Device to Device Bandwidth, 1 Device(s)

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			73711.0

[bandwidthTest] - Test results:

PASSED

Press <Enter> to Quit...

-----------------------------------------------------------

Hardware 2

[bandwidthTest]

./bandwidthTest Starting...

Running on...

Device 0: Tesla C2050

 Quick Mode

Host to Device Bandwidth, 1 Device(s), Pinned memory, Write-Combined Memory Enabled

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			5752.5

Device to Host Bandwidth, 1 Device(s), Pinned memory, Write-Combined Memory Enabled

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			3393.1

Device to Device Bandwidth, 1 Device(s)

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			86614.2

[bandwidthTest] - Test results:

PASSED

Press <Enter> to Quit...

-----------------------------------------------------------

Would you like to elaborate on that? Why HW2 has a far higher performance boost than HW1 with the pinning in place?

Thanks!

Ok, never mind, I gave a look at the documentation.

Thanks for your help.