CUDA texture cache

santyhyammer · May 21, 2008, 1:10am

How big is the G80’s texture cache? 6Mb? 4Mb? 2Mb?
I ask this because I need to traverse a big tree stored in a texture and the algorithm has no coherency ( so the kernel gonna travel very random parts of the texture… and the shared memory cannot help much )

thx

seibert · May 21, 2008, 1:42am

The programming guide (A.1.1) says 6-8 kB per multiprocessor. I assume the exact value depends on the GPU. That’s not going to help you much if you can’t get any data locality.

santyhyammer · May 21, 2008, 3:06am

Well, if a cache miss is only 1/2 cycles then will be ok… compared with the 600 of a global memory read…

E.D_Riedijk · May 21, 2008, 4:57am

having no data locality at all means fetching the data from global memory.

inducer · June 6, 2008, 11:28pm

Relatedly: Is the texture cache shared between all multiprocessors?

In other words: Do I benefit if I have fetch locality that’s not within the same thread block?

Andreas

PS: I know that not all of the potentially-local thread blocks will be scheduled at the same time, but I’d expect that that would be true at least of a certain fraction.

santyhyammer · June 7, 2008, 12:12am

From the CUDA2 docs, section "3.1 A Set of SIMD Multiprocessors with On-Chip

Shared Memory"

A read-only texture cache that is shared by all the processors and speeds up reads

from the texture memory space, which is implemented as a read-only region of

device memory.

And from A.1.1:

The cache working set for texture memory varies between 6 and 8 KB per

multiprocessor;

So it’s not very clear if it’s shared or not… haha!

inducer · June 7, 2008, 12:34am

Thanks for the quick reply!

You’re right, even with that it’s still not clear. In 3.1, the phrase “that is shared by all processors” is used in reference to shared memory as well.

Andreas

E.D_Riedijk · June 7, 2008, 6:59am

There are some other papers out there that provide more detail:

You have a Texture Processor Cluster. There the texture unit is, with the cache (16 kB). Such a cluster contains 2 MultiProcessors. Therefore each multiprocessor has 8kB of texture cache ‘available’. And each Multiprocessor (that contains 8 Shader Proccesing Units) has 16 kB of shared memory that is accessible from those 8 SPU’s)

Topic		Replies	Views
1d texture cache CUDA Programming and Performance	4	3209	October 8, 2008
CUDA texture memory performance CUDA Programming and Performance	4	33720	January 13, 2009
Texture memory performance CUDA Programming and Performance	4	5054	June 1, 2009
texture memory cache size CUDA Programming and Performance	3	3523	November 3, 2009
About texture cache and spatial locality CUDA Programming and Performance	15	11594	July 24, 2009
Texture cache characteristics 2D cache size CUDA Programming and Performance	5	6256	May 8, 2007
May you describe texture cache in general? CUDA Programming and Performance	0	2706	May 29, 2007
Shared Mem caching strategy Comparison of benchmark results CUDA Programming and Performance	9	4344	May 11, 2008
memory size how can i know the size of the different memories? CUDA Programming and Performance	6	6263	November 4, 2009
basic texture cache question texture cache: inter- or intra- block? CUDA Programming and Performance	4	3400	January 30, 2008

CUDA texture cache

Related topics