Does device memory throughput include texture reads (CUDA 5.5 / Tesla K20c)?


I am tuning a memory-intensive kernel on a Tesla K20c and am trying to interpret some of the results from the Visual profiler. From what I understand, device memory read throughput should be the combined throughput for data sent from device memory to the L2 cache, the texture cache, the RO cache and the constant memory cache, correct? Or does it just include global memory reads (L2 cache)?