What is happening in hardware in the L2 cached texture interpolation?

As you know one can use texture for flating point memory index accessing. This is (at least in my experience) faster than interpolating the data oneself as it is “hardware accelerated” (according to all docs).

But what is going on in that hardware acceleration? Does the L2 cache have some sort of software that interpolates? Is it som sort of Application-specific integrated circuit (ASIC) that does the job? is that happening before or after the L2 cache loads from global texture memory?

All DRAM accesses go through the L2 cache. Thereafter, texture cache and hardware is separate from other entities such as the L1 cache. Texturing, including interpolation, is handled by the texture unit, which is part of the SM. The texture unit consists of cache, interpolation hardware, and other functions necessary for texturing.

Thanks txbob.

My question is more: what is the texture unit doing? What are that chache, interpolation hardware, and “other fucntions”?

The extent of the descriptions that I know of are contained in the programming guide, here is one reference (there are others - just search through it):

http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#texture-fetching

There is barely any information in that (or other) links I have visited.
Thanks anyways.