Optimal 2D Locality Test for Texture Memory

byung · September 4, 2009, 3:20pm

Hi, all,

I am trying to understand 2D locality for texture memory use
and wondering when exactly optimal 2D locality is realized in terms of threads.

For example, take a look at following serial code and two equivalent CUDA codes: one for 2D locality and the other for 1D locality

// serial code

float A[M][N];
float B[M][N];
for (int i=0; i<M; i++) {
for (int j=0; j<N; j++) {
A[M][N] = B[M][N];
}
}

// CUDA 1 (2D locality simulation)

texture <float, 2, cudaReadModeElementType> texRef;
dim3 threads(16, 16);
dim3 grid(M/16, N/16);
global kernel(…) {
int x = blockIdx.xblockDim.x + threadIdx.x;
int y = blockIdx.yblockDim.y + threadIdx.y;
A[y] = tex2D(texRef, x, y);
}

// CUDA 2 (1D locality simulation)

texture <float, 2, cudaReadModeElementType> texRef;
dim3 threads(256, 1);
dim3 grid(M/256, N/1);
global kernel(…) {
int x = blockIdx.xblockDim.x + threadIdx.x;
int y = blockIdx.yblockDim.y + threadIdx.y;
A[y] = tex2D(texRef, x, y);
}

I tested these two simple test programs but don’t see any performance differences.
Could you explain why and what case it could make a different in performance?
Thanks,

hdinh · September 9, 2009, 3:28pm

Not sure if this is correct, but my understanding is this…

Texture cache (16KB): neighboring elements in all directions of one access point can be stored in the cache for more immediate access by other threads.
2D texture: more directions can be covered, thus more random access will benefit from this.
1D texture: only front and back neighbors can be stored in the cache. Very similar to global access.

Seems like your code is not fetching elements randomly, so the access path is as fast as regular coalesced global access in both cases.

Topic		Replies	Views
1D versus 2D textures Spatial Locality CUDA Programming and Performance	2	4230	June 5, 2008
Texture doesn't cache if no 2D locality? texture cache CUDA Programming and Performance	0	2983	March 30, 2007
Performance Considerations using Texture Access Does the performance depend on the access pattern? CUDA Programming and Performance	1	1432	August 21, 2009
Texture cache characteristics 2D cache size CUDA Programming and Performance	5	6256	May 8, 2007
Texture Memory Questions CUDA Programming and Performance	5	1388	August 4, 2010
1D texture cache CUDA Programming and Performance	2	3532	October 24, 2007
Convenience of 2D CUDA texture memory against global memory CUDA Programming and Performance	4	4437	January 21, 2013
For what case should I use texture memory? CUDA Programming and Performance	8	2804	May 26, 2010
Using 1D Textures CUDA Programming and Performance	1	2424	November 28, 2008
About texture cache and spatial locality CUDA Programming and Performance	15	11594	July 24, 2009

Optimal 2D Locality Test for Texture Memory

Related topics