Texture vs Global memory which of this is faster?


I’m testing the memory acess to know which type is faster, Texture or Global.

In my tests I deduce that reading Global memory is faster.
Is this right?
Or am I doing something wrong? :unsure:

The answer depends on many factors:

    [*]GPU architecture (pre-Fermi cards have no caches other than texture cache so that using texture memory should give better memory performance)[*]Memory access pattern (global memory is L1/L2-cached and texture memory is a synonim for a separate texture cache, so actually you ask which cache is faster)[*]Memory access pattern again (is it fully coalesced, partially coalesced, completely random, locally random/sequential (spatially and/or temporarily), each memory location is read once/many times etc.)[*]Memory access pattern again and again (do you read chars/ints/doubles? Each cache has a different cache line width and behaves differently for different data types; moreover, L2 is global, L1 is local to multiprocessor, texture cache is in between, I guess)[*]L1/L2 cache configuration (If I remember well, you can switch off/on L2 at compile time and change the number of registers available to L1 at runtime).

In my application texture memory is a bit faster than global memory on Fermi , even though CUDA documentations recommends just the opposite, and it is much faster on GTX 280.

So: the only way to know is to experiment!

If you are doing anything that has a 2D/3D spatial component to it then I’d recommend texture memory. Reads from texture memory are cached in a manner that preserves spatial locality, meaning that data reads from nearby points in space will possibly be cache hits.

I personally like to use texture memory for my read only accesses, such as constant field data. Mostly because of the clamped addressing, and it is a little faster for what I am using it for.