Why tex1Dfetch faster in 10-15 times then a global memory ? tex1Dfetch faster

itmanager85 · December 30, 2011, 11:02am

Why tex1Dfetch faster in 10-15 times then a global memory ?

Jimmy_Pettersson · December 30, 2011, 1:17pm

Because the data is being cached and reused by multiple threads possibly across multiple blocks.

If you are running a Fermi GPU one would expect the much larger L1 cache to do the same job for you automatically. What is your GPU?

pasoleatis · December 30, 2011, 4:22pm

It is cached on the chip, so the data is closer to the cores. But this happens for cards with cc smaller than 2.0 For the Fermi there is a L1 and L2 cache and it can be faster in many situations not to use textures, but it can still help for not coalesced acesses.

Jimmy_Pettersson · January 3, 2012, 5:25am

@pasoleatis

Do you know of any good theoretical comparisons for CC 2.x where they’ve specifically benchmarked using texture cache vs L1 & L2 cache ?

The texture cache could potentially be much faster for interpolation but I wonder if it might also be more efficient when all blocks are reusing a small amount of data extensively ( << 8 KB ).

pasoleatis · January 3, 2012, 12:14pm

I think it is mentioned in the Fermi tuning guide. They are comparing the speed of the texture cache to the speed of L1, so it is more the theoretical speed.

Jimmy_Pettersson · January 3, 2012, 3:00pm

Yeah they dont really go as in depth on the subject as one would like:

Fermi tuning guide:

pasoleatis · January 3, 2012, 3:55pm

It depends a lot on the problem and sometimes when lots of data is cached using textures can still improve performance, by reducing some of the L1 cache.

Topic		Replies	Views
Fermi cache performance L1 vs L2 cache CUDA Programming and Performance	3	1596	May 2, 2010
Texture vs Global memory which of this is faster? CUDA Programming and Performance	2	5460	August 18, 2011
Texture vs Global Memory Bandwidth CUDA Programming and Performance	5	6557	March 25, 2010
what's the benefit of using texture memory in Fermi verus using global memory CUDA Programming and Performance	12	2788	August 9, 2010
Relevance of tex2D() on Fermi Tex instructions are less important on Fermi, but are they obsolete? CUDA Programming and Performance	6	2554	March 24, 2011
the worse performance using texture memory any ideas? CUDA Programming and Performance	4	1403	July 5, 2011
I am trying to compare the performance of texture fetch and usual memory fetch CUDA Programming and Performance	10	2243	July 19, 2010
Memory performance in image processing example CUDA Programming and Performance	9	1600	March 24, 2011
Texture memory vs. constant memory access latency CUDA Programming and Performance	3	12480	June 14, 2011
Any advantance using Texturing from Device Memory tex1Dfetch vs tex2D CUDA Programming and Performance	6	3533	July 31, 2008

Why tex1Dfetch faster in 10-15 times then a global memory ? tex1Dfetch faster

Related topics