Texture Reads What is the source of performance increase?

Koobas · March 9, 2011, 10:48pm

I am trying to understand the source of performance increase when using textures on Fermi.
I am speculating here.
I would appreciate if someone confirmed or denied my suspicions.

According to the documentation there should be no direct performance increase.
However, also according to the documentation, texture reads bypass L1.

So, correct me if I am wrong.
If I am loading to shared memory without using textures,
I read to a register (through L1) and drop in shared memory.
Which makes no sense at all, because all I am accomplishing is polluting L1.

I can disable caching in L1 through a compiler flag, but that will also disable L1 cashing for local variables, which I want cashed in L1.
So, in other words, I want cashing for local variables in L1 (so should not disable L1 cashing),
but I don’t want L1 cashing for my “actual data”, so I should declare it as texture (if it is read only).

Did I get it right?

ceearem · March 9, 2011, 10:59pm

Most often the reason is that non texture reads have a cacheline size of 128 byte opposed to 32 byte for texture reads.
So in case you access random 32byte (float4) or smaller structures in the memory, you only need to laod 1/4th of the elements in Texture cache as you would need to read in L2/L1 cache. While the peak bandwidth of L2 cache is higher than that of the texture cache, for random access textures are still way better.

Ceearem

P.S. google is your friend: “cuda fermi texture L2” gives the following two posts in these forums:

Koobas · March 9, 2011, 11:09pm

Okay, nothing random about my access.

Always fetching in chunks of 128 bytes.

Koobas · March 9, 2011, 11:12pm

What about loading shared memory?

The transfer is device_memory → registers → shared_memory, right?

So with L1 on, the transfer is device_memory → L2 → L1 → registers → shared_memory, right?

So, if the access is “read only”, it only makes sense to use textures for the data, right?

Topic		Replies	Views
Texture and L1 memory bandwidth CUDA Programming and Performance	14	9797	December 14, 2011
Texture vs Global Memory Bandwidth CUDA Programming and Performance	5	6560	March 25, 2010
Why texture/constant memory under FERMI architecture CUDA Programming and Performance	23	4021	November 3, 2010
what's the benefit of using texture memory in Fermi verus using global memory CUDA Programming and Performance	12	2789	August 9, 2010
L2 read/write misses greater than requests CUDA Programming and Performance	11	3033	May 11, 2011
Texture memory performance CUDA Programming and Performance	4	4976	June 1, 2009
Using texture cache or L1 and L2 chache CUDA Programming and Performance	7	1205	November 25, 2010
Relevance of tex2D() on Fermi Tex instructions are less important on Fermi, but are they obsolete? CUDA Programming and Performance	6	2559	March 24, 2011
GTX 470 performance gains too low ? (texture operations) CUDA Programming and Performance	16	10941	April 22, 2010
Reading data CUDA Programming and Performance	12	2700	July 18, 2011

Texture Reads What is the source of performance increase?

Related topics