The theoretical Bandwidth for a GTX 480 is 177.408 GB/sec.
This value comes from www.gpreview.com, and it actually corresponds to the computation of the “CUDA C Best Practices” document, that is, (1848 MHz * 10e6 * (384bit/8) * 2(DDR))/10e9 = 177.408 GB/sec.
The profiler also agrees with this bandwidth.
Nevertheless, executing the BandwidthTest included in SDK leads to a smaller value.
Specifically, it shows a bandwidth around 118GB/sec, concerning the device to device copy (cudaMemcpy-deviceToDevice).
Does anyone know why there exists such a difference?
I need a reference to evaluate the performance of some kernels. My best solution achieves a 104Gb/s bandwidth, and I wonder where is the real limit for the bandwidth of my device (118 or 177?).