I am unclear about the latency of Texture memory vs Constant memory accesses with CUDA. I am only interested in accesses when the cache is hit.
Section §5.3.2.5 of the CUDA Programming Guide, about Texture Memory, says:
The following document about the GTX280 http://www.networkmultimedia.org/Publications/practicals/beyer2009.pdf (chart at the top of PDF page 25) mentions a very small latency (~register latency) for Constant memory accesses when I hit the cache, and a latency of 100 cycles when hitting the Texture cache.
Is it yes or no much faster to access the Constant memory cache (~register latency) than it is to access the Texture Memory cache (sounds like this is 100 cycles)? Is it the same for G80 and Fermi boards?