Texture cache architecture Line size of texture cache

xboxor · April 19, 2008, 11:32pm

I was looking at binding a chunk of linear memory to a texture reference, in order to take advantage of the texture cache. The kernel I am writing will exhibit some 1D spacial locality, as well as temporal locality, so hopefully the texture cache will suit it nicely.

However I am confused about the architecture of the texture cache which I can’t really find any specifics on. Exactly what data will a cache miss bring into the texture cache, and how much, assuming the texture reference is bound to linear memory? That is, will the memory it brings in be a chunk of continuous linear memory, and how much?

And if this is dependent upon GPU architecture, the kernel will most likely be running on either the Tesla C870 or a 9800 GTX.

Thanks for any clarifications!

MisterAnderson42 · April 20, 2008, 1:31pm

In my extensive testing, I have found that you can get optimal use of the texture cache with spatially local accesses within each warp. Temporal locality matters not at all, because of the large number of other warps running on that multiproc will cause the first values to be flushed before the first warp comes to the next read.

danko9 · August 27, 2008, 3:06pm

MisterAnderson or Others,

I am trying to figure out how local texture access has to be to benefit from the cache and am looking for some guidance.

I am considering using a texture fetch on a 2d array of float4s in order to access within one warp four float4s on the same row with a stride of 16 columns. I am thinking of using a 2d array for 1d access for irrelevant reasons. Any idea as to whether such access would have few cache misses? What if I used a 1d texture fetch instead?

It seems like it would be difficult to give a hard answer but if you could help me think about how to approximate this, it would be really helpful.

Thanks,
Danko

MisterAnderson42 · August 27, 2008, 8:57pm

If I had to guess, I’d say that is probably close enough to get some benefit of the texture cache. How much benefit? I don’t know. It would only be a matter of minutes to write a quick microbenchmark to test the memory bandwidth you get with such a memory access pattern.

Since you are reading across rows, there would be no performance delta (besides the extra cudaArray setup time needed for the tex2D texture).

Topic		Replies	Views
Texture access... Fetch size, Cache size & performance CUDA Programming and Performance	3	2929	December 29, 2009
Performance Considerations using Texture Access Does the performance depend on the access pattern? CUDA Programming and Performance	1	1398	August 21, 2009
Textures: linear memory vs cudaArrays CUDA Programming and Performance	9	7839	October 16, 2007
Texture cache characteristics 2D cache size CUDA Programming and Performance	5	6129	May 8, 2007
Texture memory performance CUDA Programming and Performance	4	4980	June 1, 2009
Benefits of Texture Memory couldnt use them... CUDA Programming and Performance	6	3215	February 13, 2008
Textures CUDA Programming and Performance	2	1653	July 22, 2008
When is it worth copying global to texture memory CUDA Programming and Performance	2	3365	July 7, 2008
For what case should I use texture memory? CUDA Programming and Performance	8	2674	May 26, 2010
1d texture cache CUDA Programming and Performance	4	3101	October 8, 2008

Texture cache architecture Line size of texture cache

Related topics