Hi all,
I put my hands on two different CUDA-enabled hardware configurations
[i]
Hardware 1
CPU: Intel X5560 (two quad-core chipset)
RAM: 6 x HMT151R7BFR4C-H9, 4GB each (= 24 GB) 1333MHz
Video: 4 Tesla C1060
Hardware 2
CPU: Intel X5570 (two quad-core chipset)
RAM: 6 x HMT125R7BFR8C-H9, 2GB each (= 12 GB) 1333MHz
Video: 4 Tesla C2050 (Fermi)
[/i]
Running the bandwidthTest test from the SDK examples, I get some strange results (figures may vary a little, but the differences are always remarkable).
Hardware 1
./bandwidthTest Starting...
Running on...
Device 0: Tesla C1060
Quick Mode
Host to Device Bandwidth, 1 Device(s), Paged memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 5033.9
Device to Host Bandwidth, 1 Device(s), Paged memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 2946.1
Device to Device Bandwidth, 1 Device(s)
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 73801.6
[bandwidthTest] - Test results:
PASSED
Press <Enter> to Quit...
-----------------------------------------------------------
Hardware 2
[bandwidthTest]
./bandwidthTest Starting...
Running on...
Device 0: Tesla C2050
Quick Mode
Host to Device Bandwidth, 1 Device(s), Paged memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 4405.8
Device to Host Bandwidth, 1 Device(s), Paged memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 2978.5
Device to Device Bandwidth, 1 Device(s)
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 86622.1
[bandwidthTest] - Test results:
PASSED
Press <Enter> to Quit...
-----------------------------------------------------------
So, there must be something wrong. How comes that HW1’s host->device performances are better than HW2’s?
Waiting for any suggestions, I thank you all.
A.