Global memory throughput on various hardware


I’m working on an algorithm where the bottleneck is the access to the global memory. The reads in global memory are sort of random and can’t really be optimized / cached in any way. I’m doing only reads.

Between the GTX 260, the GTX 280 and the Tesla C1060, should I see a difference in global memory reads throughput?



GTX260 - 112 GB/s
GTX280 - 142 GB/s
Tesla C1060 - 102 GB/s

They all have the same architecture so the bandwidth may be interpeted at face value.