a question about a strange performance degradation using texturing

KenwuD · February 17, 2011, 12:31am

HI ALL

I 'm writing a CUDA algorithm that have to process data, fetching them from an array (‘D’) of float in device memory.

For a lot of reasons this data are fetched in a random order (‘index’) from memory so I bind a texture (‘tex_D_float’)

on that array to improve performace. This is the source code:

..

texture<float,1> tex_D_float;

..

cudaBindTexture(NULL, tex_D_float, cu_vector_D , VectorDim*sizeof(float));

..

__global__ kernel1 <<<....>>> (.....)

{

     float        tempFloatD;

     unsigned int index;

     ....

     [calculate "index" value]

tempFloatD=tex1Dfetch(tex_D_float,index);

     ...

     [process data]

     ....

     [output data]

     }

[number of threads=VectorDim]

With this code, a single thread are fetching only one float at a time (tempFloatD).

In a second stage, I noticed that I could improve processing algorithm by fetching

and processing two consecutive (in memory) float data (float2 type) at a time .

If I use no texture to fetch float2 data per thread, this improvement is very evident

respect fetching and processing a single float datum per thread without texture.

The strange behaviour is that the performance does not follow

this improvement…only when I use texture.

I noticed that, with a texture system to fetch data, using a single thread to fetch float2 data is worse

than using a single thread to fetch a single float datum.

This is the source code I used for fetching and processing float2 data:

..

texture<float2,1> tex_D_float2;

..

cudaBindTexture(NULL, tex_D_float2, cu_vector_D , VectorDim*sizeof(float));

..

[inside the kernel]

__global__ kernel2 <<<....>>> (.....)

{

     float2        tempFloat2_D;

     unsigned int index;

     ....

     [calculate "index" value]

tempFloat2_D=tex1Dfetch(tex_D_float2,index);

     ...

     [process data]

     ....

     [output data]

     }

[number of threads=VectorDim/2]

I tried this algorithm on devices with compute capability 1.1 and 1.3

noticed the same performace degradation using texture to fetch float2 data.

did a make a mistake in the code?? or cuda device have really this performace degradation

using texture to fetch float2 data per thread?

please …help me

thank you in advance

Rocco

Topic		Replies	Views
(Performance) Two float1 or a single float2 texture? CUDA Programming and Performance	1	1254	August 18, 2009
Texture question CUDA Programming and Performance	11	4639	June 3, 2009
I am trying to compare the performance of texture fetch and usual memory fetch CUDA Programming and Performance	10	2372	July 19, 2010
different calc results when using texture fetch CUDA Programming and Performance	2	2214	July 22, 2009
Texture memory fetching is slower than global memory (CC 6.1). What am I doing wrong? CUDA Programming and Performance	0	576	May 2, 2019
multiple texture fetch bandwidth CUDA Programming and Performance	1	5215	April 21, 2008
Is texture fetching thread-safe? Error found in texture fetching, output and code included CUDA Programming and Performance	5	1135	October 17, 2010
Decreased performance when using textures CUDA Programming and Performance	2	505	April 8, 2019
Do different textures and fetches have different performances? CUDA Programming and Performance	0	2696	February 15, 2011
Global load optimization using texture CUDA Programming and Performance cuda , texture-tools	2	918	August 2, 2022

a question about a strange performance degradation using texturing

Related topics