no speedup with textures

Is it possible to get no speedup after using texture memory in CUDA?
What would be the most important factor? Has anyone got the same experience?

Thanks.

Bottlenecks in CUDA code can be many things… divergence, too many registers (reducing blocks), use of local memory, memory bandwidth, PCIe bandwidth, memory latency, active warp count causing pipeline stalls, mismatched block workloads, etc.

Texture queries only help with global memory latency, so if your bottleneck is elsewhere, it’s not going to do anything for you.

Texture cache is not very helpful. For example, it is no better in terms of bandwidth or latency compared to coalesced reads. It is useful because it loosens the coalescing requirements, and works better when coalescing is not possible. In general, it is not what a proper “cache” should be.

My previous post was wrong, textures reduce device BANDWIDTH, but have similar latency.

The idea is that the read from cache eliminates the need to go outside the GPU for data. Every query outside the GPU to global memory uses memory bandwidth.

But the reads are not reordered just because they’re available sooner, so the memory latency is (mostly) the same.

What do you mean by “reorderd”? The reduction in latency is caused by reorder?