I am trying to understand the source of performance increase when using textures on Fermi.
I am speculating here.
I would appreciate if someone confirmed or denied my suspicions.
According to the documentation there should be no direct performance increase.
However, also according to the documentation, texture reads bypass L1.
So, correct me if I am wrong.
If I am loading to shared memory without using textures,
I read to a register (through L1) and drop in shared memory.
Which makes no sense at all, because all I am accomplishing is polluting L1.
I can disable caching in L1 through a compiler flag, but that will also disable L1 cashing for local variables, which I want cashed in L1.
So, in other words, I want cashing for local variables in L1 (so should not disable L1 cashing),
but I don’t want L1 cashing for my “actual data”, so I should declare it as texture (if it is read only).
Did I get it right?