cudaBindTexture2D vs cudaBindTextureToArray

lars · October 23, 2009, 7:33am

Hi,

Is there any performance difference between using cudaBindTexture2D and cudaBindTextureToArray when accessing 2D textures? If not, what is the point of using 2D arrays?

Also, are there any row alignment (pitch) requirements when using cudaBindTexture2D? Will it work full speed with any pitch, work but run slower with unaligned rows, or fail to work at all with unaligned rows?

What are the bandwidth difference between cudaMemcpy and cudaMemcpy2D when doing host->device memory transfers? Is it better to align rows on the host and do a simple cudaMemcpy(), or let cudaMemcpy2D() do the gpu memory row alignment?

I guess I should just run some benchmarks myself, but if someone already have figured this out, some comments would be great.

/Lars

Skribtsov · October 23, 2009, 9:45am

we haven’t done the detailed measurements either , but I would share some related experience here

a) once data is in texture memory it does not really matter how it got there. texture memory is cached so alignment does not really matter

b ) I think simple cudaMemcpy works faster than 2D variant, so yes, if you can store it aligned on CPU - do it.

c) the point of using 2D arrays is that they are have stride which is convenient for coalesced memory access patterns when data resize in global memory (note loading from global memory in a continious, coalesced way is almost same fast as from texture)

d) offtopic: by the way new devices are less sensitive to non-coalesced memory access!!!

Simon_Green · October 23, 2009, 1:50pm

Yes, there is a difference. cudaBindTextureToArray() uses cudaArrays, which are stored in special memory layout that is optimized for texture fetches with 2D locality. The only problem is that you can’t directly write to cuda arrays (you have to use cudaMemcpyToArray).

cudaBindTexture2D() is a recent addition that allows you to bind any piece of global memory as a 2D texture (we sometimes call this pitch linear texturing). This is convenient since you can write directly to this memory, but since they are laid out linearly the fetch performance can be lower, depending on the access pattern.

I’d recommend testing both to see which is faster.

lars · October 24, 2009, 2:53am

Thanks, that’s useful information. I’ll try both versions to see if the memcpy to an array is worth it for me.

I’d like to better understand how 2D locality caching works though… I’m having pretty much the same question as raised in this thread:

[url=“The Official NVIDIA Forums | NVIDIA”]The Official NVIDIA Forums | NVIDIA

In short, will I be wasting cache memory when using 2d texture arrays if the threads in my warps (and blocks) are accessing texture elements in a linear (1D) pattern, i.e along the x axis? (each thread in a block will be accessing a different x coordinate along the same row) Will a 2D texture access cache elements from rows above and below? If yes, are these cached elements likely to be reused by other warps later on, or are they likely to be evicted by other linear accesses to the same row my other warps in the block?

I’m not sure my questions are making any sense to you… If not, is it still possible to give some general advice to keep in mind in order to use the available cache memory efficiently when accessing 2d textures?

/Lars

Quoc_Vinh · October 24, 2009, 3:23am

Thank Simon Green,

Your answers exactly what I had thought.

When CUDA 2.2 had released I built a program to evaluate the of performance of cudaBinTexture2D() and cudaBinTextureToArray() functions.

After experimented, I had realized that the speed of accessing data in texture (using cudaBinTextureToArray()) is more faster than another one.

Topic		Replies	Views
Question about texture memory CUDA Programming and Performance	3	4443	May 27, 2009
Using 2d texture fetchs without binding to array Can it be done? CUDA Programming and Performance	5	3333	February 21, 2008
Textures: linear memory vs cudaArrays CUDA Programming and Performance	9	7773	October 16, 2007
cudaBindTexture2D problem CUDA Programming and Performance	3	11767	August 3, 2010
Avoiding a device write using textures and arrays. CUDA Programming and Performance	3	2794	August 7, 2008
Memory performance in image processing example CUDA Programming and Performance	9	1600	March 24, 2011
cudaMemcpy2DArraytoArray vs cudaMemcpy2DtoArray what is the difference? CUDA Programming and Performance	2	6257	March 26, 2009
cudaBindTexture2D vs cudaBindTextureToArray CUDA Programming and Performance	0	279	July 23, 2020
performance of cudaBindTextureToArray CUDA Programming and Performance	1	7836	July 5, 2007
How to use mappable memory with texture mem in CUDA 2.2Beta CUDA Programming and Performance	6	7624	March 25, 2009

cudaBindTexture2D vs cudaBindTextureToArray

Related topics