Tesla C1060 Memory Bandwidth

brooklet · March 17, 2010, 12:54am

I used bandwidthtest in the SDK to test the new C1060 I just bought. The device to device memory bandwidth is about 74 GB/s, which is quite difference from the one in the spec (102 GB/s). I am wondering what may cause this difference? I am using CUDA 2.3 and Windows 7 64-bit.

Thanks!

zeus13i · March 17, 2010, 5:48am

I get the same result with CUDA 2.3 Ubuntu 9.04 64-bit.

stefano@rampage:~/project$ ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release/bandwidthTest

Running on......

	  device 0:Tesla C1060

Quick Mode

Host to Device Bandwidth for Pageable memory

.

Transfer Size (Bytes)   Bandwidth(MB/s)

 33554432			   5426.9

Quick Mode

Device to Host Bandwidth for Pageable memory

.

Transfer Size (Bytes)   Bandwidth(MB/s)

 33554432			   4758.4

Quick Mode

Device to Device Bandwidth

.

Transfer Size (Bytes)   Bandwidth(MB/s)

 33554432			   73475.7

&&&& Test PASSED

eyalhir74 · March 17, 2010, 8:06am

102GB/s is marketing/peak performance :)

74GB/s is good :)

eyal

CapJo · March 17, 2010, 11:50am

Even though, it’s discouraging that even NVIDIA’s own memory bandwidth tool only achieves only 74 GB/s.

How is the bandwidth measured from device to device? Might it be that reading and writing is measured instead

of the bandwidth in one direction (reading from memory)?

How much bandwidth do you geht with a simpile linear access pattern like device_memory[threadIdx.x]?

mfatica · March 17, 2010, 2:34pm

If you run the STREAM benchmark, you can see 82 GB/s.

Device Selected 1: “Tesla C1060”
STREAM Benchmark implementation in CUDA
Array size (double precision)=6000000
using 384 threads per block, 15625 blocks
Function Rate (MB/s) Avg time Min time Max time
Copy: 82258.0560 0.0012 0.0012 0.0012
Scale: 82123.8393 0.0012 0.0012 0.0012
Add: 82006.7585 0.0018 0.0018 0.0018
Triad: 82006.7585 0.0018 0.0018 0.0018

Uncle_Joe · March 17, 2010, 7:27pm

First, bandwidthTest reports GiB/s, not Gb/s. 102 Gb/s = 95GiB/s.

In my experience, the sustained RAM bandwidth is usually 2/3 of the advertised value. Why? Because that’s the burst speed when the memory chips transmits every cycle:

1600MhZ * 8 bytes/bank * 8 banks = 95GiB/s

Current memory technology use burst mode to read 4 or 8 adjacent memory cells in the same row to avoid multiple column address latencies. This obviously can’t be sustained and switching to a different DRAM row will take even longer.

If you look at figure 18 here, it shows 5 cycles of read latency followed by 4 cycles of actual data output (44% peak bandwidth). Figure 19 shows the best case when you’re able to completely overlap the column addressing latency.

kynan · August 19, 2010, 10:18am

Not quite sure I understand how you come to this conclusion. I would compute the theoretical peak memory bandwidth as

0.8 GHz memory clock * 2 (DDR memory) * 512 bit memory bus / 8 (since we want Byte)

which would give 102.4 GiB/s

kynan · August 19, 2010, 10:18am

Not quite sure I understand how you come to this conclusion. I would compute the theoretical peak memory bandwidth as

0.8 GHz memory clock * 2 (DDR memory) * 512 bit memory bus / 8 (since we want Byte)

which would give 102.4 GiB/s

Uncle_Joe · August 19, 2010, 6:37pm

“which would give 102.4 GiB/s”
No, it’s 95 GiB/s

The definitions of GHz & GiB are:

1 GHz = 10^9 Hz
1 GiB = 2^30 bytes

Uncle_Joe · August 19, 2010, 6:37pm

“which would give 102.4 GiB/s”
No, it’s 95 GiB/s

The definitions of GHz & GiB are:

1 GHz = 10^9 Hz
1 GiB = 2^30 bytes

kynan · August 19, 2010, 9:31pm

My bad, didn’t realize GHz are of course decimal based. Thanks for clarifying!

kynan · August 19, 2010, 9:31pm

My bad, didn’t realize GHz are of course decimal based. Thanks for clarifying!

Topic		Replies	Views
Basic question regarding bandwidthTest on Tesla C1060 CUDA Programming and Performance	1	1393	January 13, 2009
Bandwidth on Tesla C1060 Which value is the response from bandwidth test? CUDA Programming and Performance	1	7576	March 5, 2010
Bandwidthtest on Tesla S1070 CUDA Programming and Performance	0	702	February 16, 2011
bandwidth test CUDA Programming and Performance	9	19306	March 24, 2009
GTX295 + Tesla C1060 on xp CUDA Programming and Performance	2	18057	August 21, 2009
TESLA bandwidthTest results CUDA Programming and Performance	5	2913	January 19, 2010
Tesla device to device bandwidth CUDA Programming and Performance	1	1514	September 19, 2008
C1060 Bandwidths CUDA Programming and Performance	3	1899	June 11, 2009
the theoretical device-device bandwidth CUDA Programming and Performance	6	3306	February 18, 2009
Very low device to device bandwidth with bandwidth test example from SDK CUDA Programming and Performance	2	6151	June 21, 2007

Tesla C1060 Memory Bandwidth

Related topics