texture cache memory bandwidth

Hi folks,

I have another question related to Cuda profiler. One thing I’ve noticed is that when you click on “Session -> Global Memory Throughput”, the calculation is only based on “gst” and “gld” performance counters.

However, texture cache misses can also consume Global Memory bandwidth, am I correct?

The profiler does report the total number of texture cache misses (“tex cache miss”), however, does anyone know how I would translate that into GB/sec?


Hard to say, as you don’t know how much memory was actually read as textures depend on 2d locality and not a single cache line and I don’t think that NVidia tell how exactly the texture cache works