Fermi Texture Cache Bandwidth


I’m running a memory bandwidth limited kernel that reads a lot of data through linear texture accesses… The profiler tells me:

Texture cache memory throughput(GB/s): 739.63
Texture cache hit rate(%): 94.21
L2 cache texture memory read throughput(GB/s): 45.64

I.e 740/16 = 46 GB/s per SM on a GTX 580. What is the theoretical maximum bandwidth per SM for the texture cache? It would be useful to know, since if it’s close to what I’m getting, the only way for me to improve performance would be to reduce the amount of data being read.