How big is the G80’s texture cache? 6Mb? 4Mb? 2Mb?
I ask this because I need to traverse a big tree stored in a texture and the algorithm has no coherency ( so the kernel gonna travel very random parts of the texture… and the shared memory cannot help much )
The programming guide (A.1.1) says 6-8 kB per multiprocessor. I assume the exact value depends on the GPU. That’s not going to help you much if you can’t get any data locality.
Relatedly: Is the texture cache shared between all multiprocessors?
In other words: Do I benefit if I have fetch locality that’s not within the same thread block?
Andreas
PS: I know that not all of the potentially-local thread blocks will be scheduled at the same time, but I’d expect that that would be true at least of a certain fraction.
From the CUDA2 docs, section "3.1 A Set of SIMD Multiprocessors with On-Chip
Shared Memory"
A read-only texture cache that is shared by all the processors and speeds up reads
from the texture memory space, which is implemented as a read-only region of
device memory.
And from A.1.1:
The cache working set for texture memory varies between 6 and 8 KB per
multiprocessor;
So it’s not very clear if it’s shared or not… haha!
You’re right, even with that it’s still not clear. In 3.1, the phrase “that is shared by all processors” is used in reference to shared memory as well.
There are some other papers out there that provide more detail:
You have a Texture Processor Cluster. There the texture unit is, with the cache (16 kB). Such a cluster contains 2 MultiProcessors. Therefore each multiprocessor has 8kB of texture cache ‘available’. And each Multiprocessor (that contains 8 Shader Proccesing Units) has 16 kB of shared memory that is accessible from those 8 SPU’s)