Questions about L2 texture cache

In the NVIDIA GeForce 8800 GPU Architecture Overview (November 2006), there is an L2 cache in the block diagram but nothing is commented about an L2 cache anywhere in that paper (or another, actually). Is there an L2 texture cache?

I’m also looking at a Wen-mei Hwu’s paper (Shane Ryoo, Christopher I. Rodrigues, Sam S. Stone, Sara S. Baghsorkhi, Sain-Zee Ueng and Wen-mei W. Hwu; Program Optimization Study on a 128-Core GPU) which doesn’t say anything about an L2 cache either. They also say that the L1 cache has over 100 cycles of latency. Is that true? So many cycles?

If there really is an L2 cache, I’d like to know how many cycles of latency it has and, if it’s possible, how big it is.

Thank you :)

There is one L1 texture cache 8KB in size per multiprocessor. There is one L2 texture cache per ROP (raster operator). The number of ROPs can be calculated from (bus size / 64). So the GTX has 384/64 = 6.

I’m not sure, but I think the size of the L2 cache is 64 KB. Someone that knows for sure feel free to correct or confirm this.

I have no idea what the latencies are. 100 cycles for the L1 cache is almost impossible to believe though. After reading the paper, I’m not sure that is what they meant to imply by saying the latency to texture memory is >100 cycles. The >100 would seem to imply a lower bound of 100 cycles, but clearly if something is in the L1 cache it would be on the order of 10 cycles. So either they forgot about this case or purposely excluded it.