the theoretical device-device bandwidth

I believe my calculation is correct but I want to conform it.

For gtx280, is the device to device bandwidth calculated the following way:

1.107 Ghz (memory clock) * 512 bit (memory interface width) * 2 (in/out combined) / 8 = 141.7 GB/s

which means the maximum one way bandwidth is half of that, i.e. 70.8 GB/s


Small correction:
1.107 Ghz (memory clock) 2 bits/clock * 512 bit (memory interface width) / 8 = 141.7 GB/s

You can get this bandwidth (well, ~70 to 80% of it) one way, not just with reads and writes.

Where does that “2 bits/clock” come from?

Are you aware of any example code that can show that?


DDR stands for “double data rate.” One bit is transferred on the rising and falling edge of the clock.

Yes. I wrote a test and posted it to the forum more than a year ago.…st&p=292058

thanks for the answer

thanks for the link. The test is well written and easy to read. The read_only_gmem<> and read_only_tex<> delivers much more than half of the peak. I am very convinced :)

In most of the examples we see a big mismatch between the theoretical and actual performance with respect to the bandwidth performance.

GTX280 theoretical bandwidth is 141.7GB/s, but the actual one reported using the benchmarks scripts is about 70 GiB/s (only 50% of the theoretical benchmark)

I have a TESLA card whose theoretical bandwidth is 102GB/sec, but on running the bandwidth benchmarks i get about 71.662087 GiB/s (about 70% of the theoretical benchmark)

Why is there so much difference ?