Speed of Constant memory over Textures

I see posts by AMD people on their forum that constant memory is almost as fast as registers, or 10x time faster that texture cache. I have a very small texture that would fit in about 10 kb, and am considering moving. It would also leave the texture cache without this stuff in it.

What I would like to know is Constant Memory at least as fast as getting something that was already in the texture cache? It does not have to be faster, just not slower from a decision making standpoint. Being much faster would be nice to know, as I do not think this is a large percentage of my wall clock, so it might not make the cut.

The programming guide says that constant memory is fast IF every thread in a warp accesses the same value simultaneously.
If they access random values (like using an array lookup) then the accesses are serialized.
So yes, constant memory can be faster or slower than textures depending on your access method.

Constant values also have their own cache (2K? 4K?) independent of the texture cache, at least on G80/G200. It’s likely different for Fermi.

Programming guide says 8 kB constant cache per multiprocessor and 6-8 kB texture cache per multiprocessor.