when I was using texture, I found texture from array has lots of advantage than texture from device memory, so I was wondering if there is any advantage about using texture form device memory?Is it more quicker than the other one?
Texturing direct from device memory using tex1Dfetch() is no faster than texturing from arrays using tex2D(), but it does have the advantage that for multi-pass algorithms you can write directly to the memory instead of having to copy the results back to an array.
Note that in practice you need to double buffer your arrays, since there is no guarantee that values written to global memory that is bound as a texture will update the texture cache.
One key difference that Simon didn’t mention is that tex1Dfetch only has a 1-dimensional cache meaning that accesses are fast only if threads in a warp access the texture with good 1D locality. Reading from an array with tex2D gives you a 2D cache so that good performance is obtained when threads in a warp access memory with 2D locality (i.e. down columns instead of across rows).
This may explain the performance difference you see, if your warps have 2D locality in their texture reads.
Also because tex1Dfetch can read directly from device memory and doesn’t require copying updated data to the Array memory for tex2D every time, as Simon mentioned. Additionally, tex1Dfetch can address longer 1D arrays (up to 2^27 elements).