Any advantance using Texturing from Device Memory tex1Dfetch vs tex2D

hi,everyone
how are you?

when I was using texture, I found texture from array has lots of advantage than texture from device memory, so I was wondering if there is any advantage about using texture form device memory?Is it more quicker than the other one?

Thank you

Texturing direct from device memory using tex1Dfetch() is no faster than texturing from arrays using tex2D(), but it does have the advantage that for multi-pass algorithms you can write directly to the memory instead of having to copy the results back to an array.

Note that in practice you need to double buffer your arrays, since there is no guarantee that values written to global memory that is bound as a texture will update the texture cache.

One key difference that Simon didn’t mention is that tex1Dfetch only has a 1-dimensional cache meaning that accesses are fast only if threads in a warp access the texture with good 1D locality. Reading from an array with tex2D gives you a 2D cache so that good performance is obtained when threads in a warp access memory with 2D locality (i.e. down columns instead of across rows).

This may explain the performance difference you see, if your warps have 2D locality in their texture reads.

is that means from every parts tex2D are better than tex1Dfetch, then why tex1Dfetch still exist?

I really confuse about it.

No, if you only need locality in 1 dimension, you are better off with tex1Dfetch, since you will have more cache-hits that way.

Also because tex1Dfetch can read directly from device memory and doesn’t require copying updated data to the Array memory for tex2D every time, as Simon mentioned. Additionally, tex1Dfetch can address longer 1D arrays (up to 2^27 elements).

Thank you