Some sources say the texture cache is for throughput enhancing not latency reduction. The CUDA best practice guide says texture are cached, potentially exhibiting higher bandwidth if there is 2D locality in the texture feteches. If an application has non-coalesced global memory access, but the access only happened once for every element(no data-reuse). Does it mean that every texture fetch is actually a miss? Do we still have benefit(or a penalty) if we bound the global memory to a texture? Thank you.