texture cache and L2 cache

770966796 · March 18, 2014, 6:11pm

Hi all, I also confused by how texture cache connects with L2 cache. As we know, texture cache line is a 2D block. However, when one data is not in texture cache, it will look into L2 cache. So in L2 cache, is the cache line for texture also a 2D block? If not, how they communicate with each other? thanks.

scottgray · March 18, 2014, 8:53pm

Looking at Nsight Memory Statistics it seems like the transaction size requested from L2 is the same size as when coming out of the texture cache (32 bytes). So I’m guessing the transactions are exactly for the same data which jibes with this statement:

“Texture memory is designed for streaming fetches with a constant latency; a texture cache hit reduces device memory bandwidth usage, but not fetch latency.”

Found here:

[url]http://docs.nvidia.com/nsight-visual-studio-edition/4.0/Nsight_Visual_Studio_Edition_User_Guide.htm#Analysis/Report/CudaExperiments/KernelLevel/MemoryStatisticsTexture.htm#Chart[/url]

The Cuda Handbook goes into a little bit of detail of how the 2D locality might be implemented:

[url]CUDA Handbook: A Comprehensive Guide to GPU Programming, The - Nicholas Wilt - Google Books

But I guess the key is to play with the geometry of your requests and try and raise your hit rate and lower your transactions per request.

770966796 · March 18, 2014, 9:57pm

scottgray:

Looking at Nsight Memory Statistics it seems like the transaction size requested from L2 is the same size as when coming out of the texture cache (32 bytes). So I’m guessing the transactions are exactly for the same data which jibes with this statement:

“Texture memory is designed for streaming fetches with a constant latency; a texture cache hit reduces device memory bandwidth usage, but not fetch latency.”

Found here:

http://docs.nvidia.com/nsight-visual-studio-edition/4.0/Nsight_Visual_Studio_Edition_User_Guide.htm#Analysis/Report/CudaExperiments/KernelLevel/MemoryStatisticsTexture.htm#Chart

The Cuda Handbook goes into a little bit of detail of how the 2D locality might be implemented:

http://books.google.com/books?id=ynydqKP225EC&lpg=PP1&dq=The%20CUDA%20Handbook&pg=PT292#v=onepage&q&f=false

But I guess the key is to play with the geometry of your requests and try and raise your hit rate and lower your transactions per request.

Thanks for your reply.

I also found some documents. But they shows different situations:

texture cache and then L2 cache.
http://www.gris.informatik.tu-darmstadt.de/projects/gpu_cache_behavior/data/13rp003-GRIS.pdf

In 4.1, it shows L1 texture cache size is 128B.

texture L1 cache, texture L2 cache.
http://www.eecg.toronto.edu/~myrto/gpuarch-ispass2010.pdf

In H, you can find that texture L1 cache line is 32B, texture L2 cache line is 256B.

Are these two sayings the same? That is, texture L2 cache is L2 cache(the one used for global load)

scottgray · March 19, 2014, 12:47pm

Not sure why the first paper claims a 128 byte L1 cache line. The second paper seems more reliable.

So a texture fetch gets you 32 bytes at time from the cache, which gets 32 bytes at a time from L2, which gets 256 bytes at time from device memory. The actual geometry of that data is opaque, but with some tweaking you should be able to find a sweet spot.

Topic		Replies	Views
Texture cache characteristics 2D cache size CUDA Programming and Performance	5	6078	May 8, 2007
L2 Texture Cache CUDA Programming and Performance	10	3183	July 5, 2010
Texture Memory ! CUDA Programming and Performance	3	7160	January 11, 2010
CUDA texture memory performance CUDA Programming and Performance	4	33513	January 13, 2009
Texture cache architecture Line size of texture cache CUDA Programming and Performance	3	2928	August 27, 2008
Textures CUDA Programming and Performance	2	1620	July 22, 2008
Understanding GPU caches can't get my head around it CUDA Programming and Performance	12	4743	March 14, 2009
Texture memory performance CUDA Programming and Performance	4	4972	June 1, 2009
L2 read/write misses greater than requests CUDA Programming and Performance	11	3030	May 11, 2011
Memory performance in image processing example CUDA Programming and Performance	9	1600	March 24, 2011

texture cache and L2 cache

Related topics