Textures: linear memory vs cudaArrays

Can someone tell me if linear memory which is bound to a texture and read via a texture reference is cached in the texture caches in the same way as reads from cudaArrays? I have read the programming guide but its not clear. It mentions that cudaArrays are optimised for texture fetching, though linear mem can also be used via texture fetching, so i’m not sure where the performance difference comes from?


Both get cached. Cache-behavior of textures bound to 2D cudaArrays may be better, due to cache optimization for 2D locality.


Thanks for the helpful reply!

Depending on your access pattern, I’d suggest some experimenting, to see which gives you better performance. I’ve seen some cases where 1D texture bound to linear memory gave higher performance for sequential accesses.


My tests confirm this. I tried to “cheat” the 2D cache by putting my 1D data into a M x M/N 2D array (where M is small). Performance was slower by about 5% than using the simple 1D data bound to device memory. Plus, with it bound to device mem, you don’t need to do Dev->Dev transfers to update the cudaArray.

The Programming Guide says:

“The cache working set for one-dimensional textures is 8 KB per multiprocessor;”

What about two-dimensional cudaarrays?

Ah i’m laughing, I went in search of that exact question and ended back up at my own thread. I would really appreciate any answer as well about this. Does the above line simply mean that the effective cache size per multiprocessor for textures is 8KB period? Ie. regardless of texture dimension even though it mentions 1D specifically.



The dark secret of 2D cache seems nVidia classified:(
Maybe we should test it ourselves.

I’m pretty sure the effective cache size is the same for 1D, 2D or 3D textures. I’m not sure why we say that in the programming guide.

The texture cache is basically there to provide good performance when accessing neighbouring texels for filtering, nothing more.

Thanks for the explanation Simon.