Can someone describe the capabilities of the 2D cache? Section 184.108.40.206 says it is optimized for 2D spatial locality, but I can’t find any details. For example if I have a 2D texture of ulong4, how many texels are in the same cache block? Am I correct in assuming the cache blocks are regular sized squares tiling across the texture? I am having good results empirically when the data is organized into 32x32 blocks, but I’m not 100% sure if this is just accidental. Do the cache block sizes depend on the memory type?
I’m not able to answer your question, but I prove 2 results I went through:
1, if randomly accessed, the 2d texture doesn’t surpass global memory. Even if the whole data size is less than 8k(cache size), texture cache helps little, the perf is the same with that in global memory. I can’t think of why.
2, the load from globalmemory -> shared memory or register is block-based, not word-based. ie. One loading such as sm[tx]= (int) gm[i]; can bring more than what we want from global memory. So we have room to utilize spatial locality and think of a little cache.
I’m sorry I can’t give you an answer. But I got another question regarding texture cache:
Section 5.1 (General Specification) says “The cache working set for 1D textures is 8 KB per multiprocessor”.
I am confused about the “1D”. What does this mean? Only 2D-Textures get cached? Probably not.
1D textures are bound to linear memory. They are cached but exhibit obviously only coherency in the “storage direction” of the linear memory, which is basically the same effect that you get with coalesced global memory access.
2D textures are stored in an array memory layout and have a special 2D cache. No GPU vendor will tell you exactly how this works, but the effect is that accesses are faster in both dimensions now.
The texture cache works very well. See my post in the other thread. Using the texture cache in this example speeds up the execution by a factor of 2 per thread from 77220 to 45156 in the mean (screenshots 3 & 5).