Shared memory vs texture fetches

Johan.Seland · April 26, 2007, 7:20am

Hello all.

I am developing a CUDA application where I have to evaluate a bunch of coefficients for each thread (it’s a kind of ray tracer). To do this, I have a set of control-points stored in a texture which has to be read in a deterministic manner by every thread.

I had the impression that this was a prime candidate for shared memory: Allocate a shared block and let each thread just do one texfetch, then read from the shared block across all threads. My timings however reveal that it is faster to let every thread do many texture fetches on their own. These texture fetches are deterministic and happens at the same time for every thread, so it is probably very cache friendly. Or is there some point I am missing?

Example code (simplified to get the point through, and only valid for cubic surfaces):

__device__

void 

calcCoeffs( float *res, const int d, const int numtiles ) {

    float u = getuv().x;

    float v = getuv().y;

	

    float bu[maxd];

    float bv[maxd];

	

    evalBernsteinBasis( u, d, &bu[0] );

    evalBernsteinBasis( v, d, &bv[0] );

   // *** Set up a shared block. Let each thread load one element of the texture

    // *** Reading outside the texture will result in a 0 being stored.

    __shared__ float4 TT[BLOCK_SIZE][BLOCK_SIZE];

    // Thread index

    const int tx = threadIdx.x;

    const int ty = threadIdx.y;

    TT[tx][ty] = texfetch( T, tx, ty );

    __syncthreads();

	

    float4 sum = make_float4( 0.0, 0.0, 0.0, 0.0 );

    for ( int i = 0; i < d+1; ++i ) 

        for ( int j = 0; j < d+1; ++j ) {

            // *** Reading the texture inside the loop this way is faster. Why?

            // float4 t = texfetch( T, i, j );

            sum = sum + bu[i]*bv[j]*TT[i][j];

        }

    res[k*4+0] = sum.x;

    res[k*4+1] = sum.y;

    res[k*4+2] = sum.z;

    res[k*4+3] = sum.w;

}

Topic		Replies	Views
Question about texture/shared memory enhance the computing efficiency CUDA Programming and Performance	3	5381	December 4, 2007
Shared Mem caching strategy Comparison of benchmark results CUDA Programming and Performance	9	4187	May 11, 2008
Best option with very few neighbor reads Shared or texture memory? CUDA Programming and Performance	3	1292	January 13, 2010
Copy from texture memory to shared memory Confused about best transfer strategy CUDA Programming and Performance	4	1554	February 11, 2010
texture memory or shared memory? which is faster, and by what factor? CUDA Programming and Performance	0	1146	March 14, 2008
Benefits of Texture Memory couldnt use them... CUDA Programming and Performance	6	3198	February 13, 2008
utilize texture memory How to use the texture more effectively? CUDA Programming and Performance	0	2815	July 4, 2008
Texture fetches and coalesced memory accesses CUDA Programming and Performance	1	1428	February 28, 2008
Shared memory problem CUDA Programming and Performance	3	2259	February 8, 2008
I am trying to compare the performance of texture fetch and usual memory fetch CUDA Programming and Performance	10	2255	July 19, 2010

Shared memory vs texture fetches

Related topics