Texture Memory Does Not Improve Speed

yaoshen · February 23, 2017, 1:26am

Hi,

I am encountering a question that texture memory does not help improve the speed compared to the global memory. A 3D texture memory is needed because it is a 3D image. What I need to do is just constantly reading from the texture memory instead of global memory.

The process is:

host part:

Read the 3D image and save it in a 1D array (say “ima”) by vectorization.
Load the “ima” into 3D texture memory (“ima_texure”)

device part:
3) Read from ima_texure

Here I show part of the code which declares and loads the data into texture memory and reads from the texture memory. Actually I don’t think there is anything wrong in my code because all run well and the results are correct. But I want to get some ideas why this (no speed improvement, even more time) may happen.

Declare and load:

    texture<float,cudaTextureType3D,cudaReadModeElementType> ima_tex;

cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<float>();
cudaExtent imaSize = make_cudaExtent(dimx,dimy,dimz);
cudaArray *ima=0;

// Image
cudaMalloc3DArray(&[b]ima[/b], &channelDesc, imaSize);
cudaMemcpy3DParms copyParams1 = {0};

copyParams1.srcPtr = make_cudaPitchedPtr((void*)ima_input, imaSize.width*sizeof(float), imaSize.width, imaSize.height);
copyParams1.dstArray = ima;	// destination array
copyParams1.extent = imaSize;		// dimensions of the transferred area in elements
copyParams1.kind = cudaMemcpyHostToDevice;
cudaMemcpy3D(&copyParams1);

ima_tex.normalized = false;
ima_tex.filterMode = cudaFilterModePoint;	//cudaFilterModePoint; cudaFilterModeLinear;
ima_tex.addressMode[0] = cudaAddressModeWrap;
ima_tex.addressMode[1] = cudaAddressModeWrap;
ima_tex.addressMode[2] = cudaAddressModeWrap;
cudaBindTextureToArray(ima_tex, ima, channelDesc);

Read:

_device float distance(float* ima,int x,int y,int z,int nx,int ny,int nz)
{
float d,distancetotal,temp;
int i,j,k,ni1,nj1,ni2,nj2,nk1,nk2,f;

    f=gcfg->patchsize;

distancetotal=0.f;
for(k=-f;k<=f;k++)
{
 nk1=z+k;
 nk2=nz+k;  
 if(nk1<0) nk1=-nk1;
 if(nk2<0) nk2=-nk2;
 if(nk1>=gcfg->dimz) nk1=2*gcfg->dimz-nk1-1;
 if(nk2>=gcfg->dimz) nk2=2*gcfg->dimz-nk2-1;

	 for(j=-f;j<=f;j++)
	 {
	  nj1=y+j;
	  nj2=ny+j;
	  if(nj1<0) nj1=-nj1;    
	  if(nj2<0) nj2=-nj2;
	  if(nj1>=gcfg->dimy) nj1=2*gcfg->dimy-nj1-1;
	  if(nj2>=gcfg->dimy) nj2=2*gcfg->dimy-nj2-1;

		for(i=-f;i<=f;i++)
		{
		ni1=x+i;
		ni2=nx+i;
		if(ni1<0) ni1=-ni1;
		if(ni2<0) ni2=-ni2;
		if(ni1>=gcfg->dimx) ni1=2*gcfg->dimx-ni1-1;
		if(ni2>=gcfg->dimx) ni2=2*gcfg->dimx-ni2-1;
                    temp = tex3D(ima_tex,ni1,nj1,nk1)-tex3D(ima_tex,ni2,nj2,nk2);
		distancetotal = distancetotal + temp * temp;
		}
	 }
}
d=distancetotal*gcfg->rpatchnomalize;
return d;

}

Robert_Crovella · February 23, 2017, 2:02am

I don’t know of anywhere (e.g. documentation) that says that “texture memory will always increase the speed of your program”.

It is a cache, with particular characteristics. However reading from global memory also goes through cache(s) with particular characteristics.

I haven’t tried to deduce the access pattern of your code. But access pattern will certainly be a predictor of cache efficiency. And its possible that the accesses associated with the texturing operation are an insignificant part of the total execution time of your code, meaning that changing the access method has little or no impact. Another possibility is that your working set is so small that it fits in a particular cache (or both types of caches) meaning that switching from ordinary global memory to texture memory might have no effect. Finally, some newer GPU architectures and their compiler support will automatically detect a read-only data operation and use something similar to the texturing path, even when your code would suggest ordinary global loads.

Such questions may be easier to answer by careful use of a profiler, rather than by inspection of code snippets.

Topic		Replies	Views
Texture vs. Global Memory CUDA Programming and Performance	4	2094	August 6, 2009
Convenience of 2D CUDA texture memory against global memory CUDA Programming and Performance	4	4437	January 21, 2013
texture memory vs global memory CUDA Programming and Performance	10	14007	August 20, 2007
Why are texture memory reads slower than global reads even though it is being accessed spatially? CUDA Programming and Performance cuda	0	491	June 19, 2020
Global versus Texture Memory - no speedup I'm not getting any benefits :( CUDA Programming and Performance	4	5302	February 17, 2008
Confusion on using texture? CUDA Programming and Performance	14	5134	September 4, 2009
Texture vs Global memory which of this is faster? CUDA Programming and Performance	2	5558	August 18, 2011
why texture makes it slower? CUDA Programming and Performance	0	945	July 8, 2009
Kernel doesn't benefit from Texture Mem CUDA Programming and Performance	11	12301	November 19, 2010
Basic Texture Question CUDA Programming and Performance	0	587	December 26, 2010

Texture Memory Does Not Improve Speed

Related topics