Understanding Performance of 2D Texture memory accesses

rajesh · February 23, 2009, 2:51pm

I have fairly large read-only array, (400*45000) single-precision float, that I am accessing using a 2-dimensional texture memory binding.
The array is partitioned in a column-cyclic manner over 45000 threads. In the inner-most loop, each thread is accessing elements in its assigned column.

When I ran the code using the texture-memory based accesses, the performance dropped by 15% over accessing the array when it was stored in global memory.
To understand the performance degradation, I would like to understand how the texture cache behaves for 2-D accesses. Specifically,

What is the policy of fetching data from memory into texture caches? Is is row-major, column-major- or blocked?
Is there an optimal way of traversing the 2-D texture map? I read somewhere that a space-filling curve-based navigation may give the best performance. Is this accurate?
What is the cache replacement policy? What is the cost of eviction?

Also, why is text2D() omitted from the reference manual (2.1)? I spent a whole day trying to figure out the exact order of the parameters.

Any information would be really useful.

Thanks!
Rajesh

E.D_Riedijk · February 23, 2009, 7:35pm

I believe it is a Z-curve. Simon Green once posted a link to a wikipedia page explaining how it is done, this is that page http://en.wikipedia.org/wiki/Z-order_(curve)

MisterAnderson42 · February 23, 2009, 8:53pm

Space filling curves are the best way to store 2 or 3D data in a 1D texture (this is what tex2D is doing under the hood as has already been pointed out).

The best way to access your 2D texture is to have threads in each warp access elements nearby along the row of the texture. The next best way (only ~1-2% slower in microbenchmarks I’ve done) is to have threads in each warp read nearby values going down a column.

rajesh · February 23, 2009, 9:13pm

Denis and MisterAnderson42,

Thanks for the link and the thread-mapping strategy. I will try to reorganize my code to suit these constraints.

-regards,
Rajesh

Topic		Replies	Views
Performance Considerations using Texture Access Does the performance depend on the access pattern? CUDA Programming and Performance	1	1398	August 21, 2009
Textures: linear memory vs cudaArrays CUDA Programming and Performance	9	7826	October 16, 2007
Texture cache architecture Line size of texture cache CUDA Programming and Performance	3	2936	August 27, 2008
Texture cache characteristics 2D cache size CUDA Programming and Performance	5	6122	May 8, 2007
Memory performance in image processing example CUDA Programming and Performance	9	1621	March 24, 2011
Texture memory performance CUDA Programming and Performance	4	4977	June 1, 2009
Texture access performance CUDA Programming and Performance	1	1717	July 30, 2007
Texture access... Fetch size, Cache size & performance CUDA Programming and Performance	3	2927	December 29, 2009
Optimal 2D Locality Test for Texture Memory CUDA Programming and Performance	1	1519	September 9, 2009
Textures CUDA Programming and Performance	2	1652	July 22, 2008

Understanding Performance of 2D Texture memory accesses

Related topics