bandwidth test

vrnova · December 2, 2008, 4:44pm

Hi, below is the bandwidth test result on my computer. Is it normal or too low? I see host to device has only about 2GB/sec…

yliu@yliu-desktop-ubuntu:~/Workspace/CUDA/sdk/bin/linux/release$ ./bandwidthTest
Running on…
device 0:GeForce GTX 280
Quick Mode
Host to Device Bandwidth for Pageable memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1928.0

Quick Mode
Device to Host Bandwidth for Pageable memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1765.5

Quick Mode
Device to Device Bandwidth
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 125984.2

&&&& Test PASSED

Press ENTER to exit…

jimh · December 2, 2008, 5:04pm

That doesn’t look too bad for unpinned host memory. What motherboard/chipset and CPU are you using?

Try the pinned memory bandwidth test, you should get better results:

bandwidthTest --memory=pinned

vrnova · December 2, 2008, 5:16pm

That doesn’t look too bad for unpinned host memory. What motherboard/chipset and CPU are you using?

Try the pinned memory bandwidth test, you should get better results:
bandwidthTest --memory=pinned

OK, the pinned memory bandwidth test looks better. About 4GB from host to device. Thanks!

yliu@yliu-desktop-ubuntu:~/Workspace/CUDA/sdk/bin/linux/release$ ./bandwidthTest --memory=pinned

Running on…

  device 0:GeForce GTX 280

Quick Mode

Host to Device Bandwidth for Pinned memory

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 4246.3

Quick Mode

Device to Host Bandwidth for Pinned memory

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 4615.4

Quick Mode

Device to Device Bandwidth

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 125786.2

&&&& Test PASSED

Press ENTER to exit…

pawel_astro · January 5, 2009, 7:05am

Does anybody get more than my 5.88 GB/s ??
see my blog…

Nitin · January 12, 2009, 9:36pm

I bought a system with a C1060 in it.

I am seeing device to device bandwidth of only 73647.9 MB/s.

That is nearly half of the spec.

Is there a setting that I should enable?

I am using CUDA 2.1beta.

Running on…

  device 0:Tesla C1060

Quick Mode

Host to Device Bandwidth for Pinned memory

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 5765.7

Quick Mode

Device to Host Bandwidth for Pinned memory

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 5593.5

Quick Mode

Device to Device Bandwidth

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 73681.8

&&&& Test PASSED

Press ENTER to exit…

tmurray · January 12, 2009, 9:48pm

D2D is a bit lower than peak because the current cudaMemcpy kernel can be slightly better optimized (at least on C1060). I think peak on C1060 that I’ve seen is 76GB/s or so. C1060 peak is only 102 GB/s, so it’s not too far (and I guess you have to take into account signaling, packet size, and everything else). But no, your results are right in line with what I expect.

sudol · March 17, 2009, 6:03pm

I also got similar results for Dev2Dev: 73GB/s, but have serious problem with my HOST-DEVICE connection:

upload and download are differing very much, and also
are very slow compering what I would expect 5-6, or even 10GB/s for a PCIe gen II connection
note the huge difference between pageable and pinned memory

Note that I have a

NVIDIA TESLA S1070 connected via two double-wired NVIDIA PCIe external cable to a
SUPERMICRO SUPERSERVER 6015TW-TB

[sudol@gpu1 release]$ ./bandwidthTest

Running on…

  device 0:Tesla T10 Processor

Quick Mode

Host to Device Bandwidth for Pageable memory

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 1376.1

Quick Mode

Device to Host Bandwidth for Pageable memory

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 837.6

Quick Mode

Device to Device Bandwidth

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 73554.8

&&&& Test PASSED

Press ENTER to exit…

[sudol@gpu1 release]$ ./bandwidthTest --memory=pinned

Running on…

  device 0:Tesla T10 Processor

Quick Mode

Host to Device Bandwidth for Pinned memory

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 4247.2

Quick Mode

Device to Host Bandwidth for Pinned memory

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 1807.9

Quick Mode

Device to Device Bandwidth

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 73487.2

&&&& Test PASSED

Press ENTER to exit…

Do you have any idea why I got so low performance?

mfatica · March 17, 2009, 6:34pm

Are you running the latest BIOS on the Supemicro twin?

I have the same model in my cluster and these are the results of the bandwidth test ( CUDA 2.0, driver 177.70.31)

cuda@compute-0-6 ~]$ /usr/local/NVIDIA_CUDA_SDK/bin/linux/release/bandwidthTest -noprompt
Using device 0: Tesla T10 Processor
Quick Mode
Host to Device Bandwidth for Pageable memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 2522.4

Quick Mode
Device to Host Bandwidth for Pageable memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 2085.9

Quick Mode
Device to Device Bandwidth
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 73419.8

&&&& Test PASSED
[cuda@compute-0-6 ~]$ /usr/local/NVIDIA_CUDA_SDK/bin/linux/release/bandwidthTest -memory=pinned -noprompt
Using device 0: Tesla T10 Processor
Quick Mode
Host to Device Bandwidth for Pinned memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 5651.9

Quick Mode
Device to Host Bandwidth for Pinned memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 5301.2

Quick Mode
Device to Device Bandwidth
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 73344.0

&&&& Test PASSED

abcdsophia · March 18, 2009, 1:41am

I tried the same test, then i get the following error:

Running on…
device 0:Device Emulation (CPU)
Quick Mode
Host to Device Bandwidth for Pageable memory
cudaSafeCall() Runtime API error in file <bandwidthTest.cu>, line 657 : feature is not yet implemented.

thezim · March 24, 2009, 9:36am

Hi All,

I’m having a similar problem with Host–>Device bandwidth on a Tesla S1070. Here are the specifics of my configuration:

Motherboard: MSI x48 Platinum, latest BIOS (v2.4)

RedHat 5.2 64-bit

8 GB RAM

CUDA 2.1, Driver version 180.29

Results of BW test (same for all 4 Tesla Devices):

[codebox]trio:~/sdk/bin/linux/release> ./bandwidthTest --memory=pinned --device=0

Running on…

  device 0:Tesla T10 Processor

Quick Mode

Host to Device Bandwidth for Pinned memory

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 1195.6

Quick Mode

Device to Host Bandwidth for Pinned memory

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 5744.7

Quick Mode

Device to Device Bandwidth

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 73386.1

&&&& Test PASSED

Press ENTER to exit…

[/codebox]

When running under driver 180.22 with a GTX280, I observed 5.7 GB/sec both directions, so I believe the mobo/BIOS is OK. The only configuration change that I made when setting up the Tesla was to upgrade to driver version 180.29 (latest available release for Tesla).

1200 MB/sec clearly indicates either a H/W or system configuration problem.

Any suggestions?

Topic		Replies	Views
Low memory bandwidth CUDA Programming and Performance	4	7224	March 10, 2008
PCI Express x16 bandwidth - host<->device transfer Bandwidth is much lower than should be CUDA Programming and Performance	38	68318	April 18, 2008
Memory bandwidth CUDA Programming and Performance	31	38666	October 5, 2007
Extremely low bandwidth CUDA Programming and Performance	10	2077	September 4, 2010
Tesla S1070 Bandwidth Problem CUDA Programming and Performance	16	11503	March 31, 2009
Host<-> device bandwidth problems slow and intermittent bandwidth on linux CUDA Programming and Performance	9	6791	January 8, 2008
Basic question regarding bandwidthTest on Tesla C1060 CUDA Programming and Performance	1	1403	January 13, 2009
TESLA bandwidthTest results CUDA Programming and Performance	5	2938	January 19, 2010
Is this PCIe 2.0 bandwidth low? 3.1 GB/s pinned CUDA Programming and Performance	45	20390	December 28, 2008
bandwith performance on PCI-E v1 slow? CUDA Programming and Performance	3	902	May 15, 2008

bandwidth test

Related topics