Can someone tell me if linear memory which is bound to a texture and read via a texture reference is cached in the texture caches in the same way as reads from cudaArrays? I have read the programming guide but its not clear. It mentions that cudaArrays are optimised for texture fetching, though linear mem can also be used via texture fetching, so i’m not sure where the performance difference comes from?
Depending on your access pattern, I’d suggest some experimenting, to see which gives you better performance. I’ve seen some cases where 1D texture bound to linear memory gave higher performance for sequential accesses.
My tests confirm this. I tried to “cheat” the 2D cache by putting my 1D data into a M x M/N 2D array (where M is small). Performance was slower by about 5% than using the simple 1D data bound to device memory. Plus, with it bound to device mem, you don’t need to do Dev->Dev transfers to update the cudaArray.
Ah i’m laughing, I went in search of that exact question and ended back up at my own thread. I would really appreciate any answer as well about this. Does the above line simply mean that the effective cache size per multiprocessor for textures is 8KB period? Ie. regardless of texture dimension even though it mentions 1D specifically.