Hi everyone,

The theoretical Bandwidth for a GTX 480 is 177.408 GB/sec.

This value comes from www.gpreview.com, and it actually corresponds to the computation of the “CUDA C Best Practices” document, that is, (1848 MHz * 10e6 * (384bit/8) * 2(DDR))/10e9 = 177.408 GB/sec.

The profiler also agrees with this bandwidth.

Nevertheless, executing the BandwidthTest included in SDK leads to a smaller value.

Specifically, it shows a bandwidth around 118GB/sec, concerning the device to device copy (cudaMemcpy-deviceToDevice).

Does anyone know why there exists such a difference?

I need a reference to evaluate the performance of some kernels. My best solution achieves a 104Gb/s bandwidth, and I wonder where is the real limit for the bandwidth of my device (118 or 177?).

Thank you.