How texture accesses are done in GF100?

Different from GT-200 series GPUs which have separate cache systems for texture accesses, GF-100 does not seem to have texture-dedicated cache. Only texture-dedicated address fetch hardware and TLB seem to exist. According to some reviews such as that on RealWorldTech, L2 Cache in GF-100 is also used for Texture accesses. Is there any analysis on how L1 cache is related to texture accesses, and how cache algorithms handle between cache lines that belong to ordinary data accesses AND texture accesses. Since texture accesses are effectively read-only, is there optimization on the cache algorithm level to utilize this, for example, cache coherency is not a must for texture accesses? Thanks.

Partial answer to this question:

Texture Cache is present at L1 level, 12KB per SM. GT200 has a 6~8 KB L1 Texture Cache for 8 cores of the SM (18~24KB per TPC), while GF100 has 12 KB L1 Texture Cache for 32 cores for the SM (with no TPC concept?). Texture access will use unified L2 along side with other memory accesses. There is also a “Uniform Cache” present at SM level, which is used for constant memory space. Its size unknown (requires some further microbenchmarks). Without doubt, “Uniform Cache” is only an L1 cache, and will also use the large, unified L2 cache too.

Don’t know the optimization of each cache yet, Texture Cache was designed not to alleviate the fetch latency problem. Should interesting to see how the latency of ShMem/L1 Data Cache, L1 Texture Cache, and L2 Cache compare…