For what case should I use texture memory?

Hello,

For what case should I use texture memory(instead of direct global memory access)?

I’ve verified the performance of global memory access via texture memory in some cases.
However, it was slower than direct global memory access in any case.

I wonder what the case suitable for texture memory usage is.

And,
Does someone know about detailed behavior of texture caching?

I understand the texture data close 2D spatially is cached when texture memory space accessed.
But, How close?

Who shares the same texture cache?
Threads in the same warp, same block, or all threads?

Thanks in advance, and I’m sorry for my poor English.

You should use texture memory when:

  • you cannot coalesce your writes, but there is locality in your data access
  • you are accessing a 2D array, and you want to skip calculating 1D indices, or you have 2D locality
  • you want to interpolate your data (linear interpolation is for free)

the cache is probably shared per Texture Processing Cluster, or 2 Multiprocessors in 1.0 and 1.1 hardware, 3 Multiprocessors in 1.2 and 1.3 hardware.

I never bought this. Turning two 2D indices (which you need in order to to address a 2D tex) into one 1D index is a single MAD ( i = x + y*xdim).

If you’re doing so many of those MADs that it might become measurable, you’re probably doing just as many memory reads afterwards, at which point your kernel gets bandwidth bound. Unless you’re hitting the L1 tex cache but at this point you’re benefiting from caching, not indexing.

A single MAD before a memory read is nothing.

If you need the wrapping behavior, there’s just an additional modulo. Clamping behavior - min and max functions.

Now, caching and imperfect coalescing - that’s more advantageous. Free filtering might make sense if you’re doing the bilinear kind. 1D linear filtering is probably more trouble than it’s worth (only 256 steps between values and you only save like 5 arithmetic instructions).

I’m sorry for my late reply.

Thanks for your answer.

So, texture cache is advantageous under any(or both) of the following two conditions :

  • Coalescing is imperfect
  • Bilinear interpolation is required

Isn’t it?

Still, I don’t know about detailed texture caching algorithm…

I’m sorry for my late reply.

Thanks for your answer.

So, texture cache is advantageous under any(or both) of the following two conditions :

  • Coalescing is imperfect
  • Bilinear interpolation is required

Isn’t it?

Still, I don’t know about detailed texture caching algorithm…

NVIDIA doesn’t publish the details, so all that anyone can do is guess.

The best bandwidth is delivered by the tex cache when the threads in each individual warp access values near each other in memory. The texture cache is too small to present any level of temporal locality and thread scheduling prevents spatial locality between threads in a block from contributing much.

If you want to read more of my musings on the texture cache, search the forums using google:

http://www.google.com/search?q=site%3Aforu…lient=firefox-a

NVIDIA doesn’t publish the details, so all that anyone can do is guess.

The best bandwidth is delivered by the tex cache when the threads in each individual warp access values near each other in memory. The texture cache is too small to present any level of temporal locality and thread scheduling prevents spatial locality between threads in a block from contributing much.

If you want to read more of my musings on the texture cache, search the forums using google:

http://www.google.com/search?q=site%3Aforu…lient=firefox-a

Thanks MisterAnderson42.

Very cool.
Your past discussions just come up to what I want to know !

Thanks MisterAnderson42.

Very cool.
Your past discussions just come up to what I want to know !