D2D is a bit lower than peak because the current cudaMemcpy kernel can be slightly better optimized (at least on C1060). I think peak on C1060 that I’ve seen is 76GB/s or so. C1060 peak is only 102 GB/s, so it’s not too far (and I guess you have to take into account signaling, packet size, and everything else). But no, your results are right in line with what I expect.
Are you running the latest BIOS on the Supemicro twin?
I have the same model in my cluster and these are the results of the bandwidth test ( CUDA 2.0, driver 177.70.31)
cuda@compute-0-6 ~]$ /usr/local/NVIDIA_CUDA_SDK/bin/linux/release/bandwidthTest -noprompt
Using device 0: Tesla T10 Processor
Quick Mode
Host to Device Bandwidth for Pageable memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 2522.4
Quick Mode
Device to Host Bandwidth for Pageable memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 2085.9
Quick Mode
Device to Device Bandwidth
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 73419.8
&&&& Test PASSED
[cuda@compute-0-6 ~]$ /usr/local/NVIDIA_CUDA_SDK/bin/linux/release/bandwidthTest -memory=pinned -noprompt
Using device 0: Tesla T10 Processor
Quick Mode
Host to Device Bandwidth for Pinned memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 5651.9
Quick Mode
Device to Host Bandwidth for Pinned memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 5301.2
Quick Mode
Device to Device Bandwidth
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 73344.0
I tried the same test, then i get the following error:
Running on…
device 0:Device Emulation (CPU)
Quick Mode
Host to Device Bandwidth for Pageable memory
cudaSafeCall() Runtime API error in file <bandwidthTest.cu>, line 657 : feature is not yet implemented.
When running under driver 180.22 with a GTX280, I observed 5.7 GB/sec both directions, so I believe the mobo/BIOS is OK. The only configuration change that I made when setting up the Tesla was to upgrade to driver version 180.29 (latest available release for Tesla).
1200 MB/sec clearly indicates either a H/W or system configuration problem.