bandwidth test

Hi, below is the bandwidth test result on my computer. Is it normal or too low? I see host to device has only about 2GB/sec…

yliu@yliu-desktop-ubuntu:~/Workspace/CUDA/sdk/bin/linux/release$ ./bandwidthTest
Running on…
device 0:GeForce GTX 280
Quick Mode
Host to Device Bandwidth for Pageable memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1928.0

Quick Mode
Device to Host Bandwidth for Pageable memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1765.5

Quick Mode
Device to Device Bandwidth
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 125984.2

&&&& Test PASSED

Press ENTER to exit…

That doesn’t look too bad for unpinned host memory. What motherboard/chipset and CPU are you using?

Try the pinned memory bandwidth test, you should get better results:

bandwidthTest --memory=pinned

OK, the pinned memory bandwidth test looks better. About 4GB from host to device. Thanks!

yliu@yliu-desktop-ubuntu:~/Workspace/CUDA/sdk/bin/linux/release$ ./bandwidthTest --memory=pinned

Running on…

  device 0:GeForce GTX 280

Quick Mode

Host to Device Bandwidth for Pinned memory

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 4246.3

Quick Mode

Device to Host Bandwidth for Pinned memory

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 4615.4

Quick Mode

Device to Device Bandwidth

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 125786.2

&&&& Test PASSED

Press ENTER to exit…

Does anybody get more than my 5.88 GB/s ??
see my blog…

I bought a system with a C1060 in it.

I am seeing device to device bandwidth of only 73647.9 MB/s.

That is nearly half of the spec.

Is there a setting that I should enable?

I am using CUDA 2.1beta.

Running on…

  device 0:Tesla C1060

Quick Mode

Host to Device Bandwidth for Pinned memory

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 5765.7

Quick Mode

Device to Host Bandwidth for Pinned memory

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 5593.5

Quick Mode

Device to Device Bandwidth

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 73681.8

&&&& Test PASSED

Press ENTER to exit…

D2D is a bit lower than peak because the current cudaMemcpy kernel can be slightly better optimized (at least on C1060). I think peak on C1060 that I’ve seen is 76GB/s or so. C1060 peak is only 102 GB/s, so it’s not too far (and I guess you have to take into account signaling, packet size, and everything else). But no, your results are right in line with what I expect.

I also got similar results for Dev2Dev: 73GB/s, but have serious problem with my HOST-DEVICE connection:

  • upload and download are differing very much, and also

  • are very slow compering what I would expect 5-6, or even 10GB/s for a PCIe gen II connection

  • note the huge difference between pageable and pinned memory

Note that I have a

  • NVIDIA TESLA S1070 connected via two double-wired NVIDIA PCIe external cable to a

  • SUPERMICRO SUPERSERVER 6015TW-TB

[sudol@gpu1 release]$ ./bandwidthTest

Running on…

  device 0:Tesla T10 Processor

Quick Mode

Host to Device Bandwidth for Pageable memory

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 1376.1

Quick Mode

Device to Host Bandwidth for Pageable memory

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 837.6

Quick Mode

Device to Device Bandwidth

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 73554.8

&&&& Test PASSED

Press ENTER to exit…

[sudol@gpu1 release]$ ./bandwidthTest --memory=pinned

Running on…

  device 0:Tesla T10 Processor

Quick Mode

Host to Device Bandwidth for Pinned memory

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 4247.2

Quick Mode

Device to Host Bandwidth for Pinned memory

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 1807.9

Quick Mode

Device to Device Bandwidth

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 73487.2

&&&& Test PASSED

Press ENTER to exit…

Do you have any idea why I got so low performance?

Are you running the latest BIOS on the Supemicro twin?

I have the same model in my cluster and these are the results of the bandwidth test ( CUDA 2.0, driver 177.70.31)

cuda@compute-0-6 ~]$ /usr/local/NVIDIA_CUDA_SDK/bin/linux/release/bandwidthTest -noprompt
Using device 0: Tesla T10 Processor
Quick Mode
Host to Device Bandwidth for Pageable memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 2522.4

Quick Mode
Device to Host Bandwidth for Pageable memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 2085.9

Quick Mode
Device to Device Bandwidth
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 73419.8

&&&& Test PASSED
[cuda@compute-0-6 ~]$ /usr/local/NVIDIA_CUDA_SDK/bin/linux/release/bandwidthTest -memory=pinned -noprompt
Using device 0: Tesla T10 Processor
Quick Mode
Host to Device Bandwidth for Pinned memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 5651.9

Quick Mode
Device to Host Bandwidth for Pinned memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 5301.2

Quick Mode
Device to Device Bandwidth
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 73344.0

&&&& Test PASSED

I tried the same test, then i get the following error:

Running on…
device 0:Device Emulation (CPU)
Quick Mode
Host to Device Bandwidth for Pageable memory
cudaSafeCall() Runtime API error in file <bandwidthTest.cu>, line 657 : feature is not yet implemented.

Hi All,

I’m having a similar problem with Host–>Device bandwidth on a Tesla S1070. Here are the specifics of my configuration:

Motherboard: MSI x48 Platinum, latest BIOS (v2.4)

RedHat 5.2 64-bit

8 GB RAM

CUDA 2.1, Driver version 180.29

Results of BW test (same for all 4 Tesla Devices):

[codebox]trio:~/sdk/bin/linux/release> ./bandwidthTest --memory=pinned --device=0

Running on…

  device 0:Tesla T10 Processor

Quick Mode

Host to Device Bandwidth for Pinned memory

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 1195.6

Quick Mode

Device to Host Bandwidth for Pinned memory

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 5744.7

Quick Mode

Device to Device Bandwidth

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 73386.1

&&&& Test PASSED

Press ENTER to exit…

[/codebox]

When running under driver 180.22 with a GTX280, I observed 5.7 GB/sec both directions, so I believe the mobo/BIOS is OK. The only configuration change that I made when setting up the Tesla was to upgrade to driver version 180.29 (latest available release for Tesla).

1200 MB/sec clearly indicates either a H/W or system configuration problem.

Any suggestions?