When to use textures

RyanV1 · January 17, 2008, 2:06pm

I am trying to write an image processing library with CUDA. When is it best to use texture memory as opposed to global memory when writing image filters (gaussian sobel, etc)? I know that texture memory is cached, so should i always be using texture memory? Why doesn’t the convolutionSeperable sample use texture memory?

If I was to use texture memory for all my images (8bit int mono, 16bit int mono, 32bit float mono), how would I chain filters without having to recopy the data? For example I wish to compute a gradient image and have it stay resident on the device and then use it as an input to another filter (no point in copying it back to sys mem only to copy back to the device again?). The gradient image would have to be generated into global memory, so how would I treat that same image memory as texture memory in the next pass? Would I have to do a device to device transfer to accomplish this? Is that slow? I used to do this all the time in direct3d without having to do the copy (at least not explicitly), so would this device to device transfer int CUDA cause my chaining to run slower than direct3d?

MisterAnderson42 · January 17, 2008, 4:24pm

A coalesced global memory read will be faster than a texture. Only use textures when you really need the cache because you can’t coalesce your reads.

If you do need a 2D texture read/modify, then you need to write to global memory and do a device to device copy. There are a few other recent threads on this topic. Another option (if you don’t need a 2D cache) is to use a 1D texture bound to device memory and write directly back into it. As long each thread only reads the same values it writes, there are no race conditions involving memory accesses.

wumpus · January 20, 2008, 1:07pm

If your access pattern is predictable you might be able to get most efficiency with coalesced global memory reads (size float2 or int2) and shared memory. If there is some randomness (but locality) in your reads, textures might be the way to go.

Device to device copy is very fast, I doubt it will ever become a bottleneck of your algorithm. Still, it takes some time of course and might be avoided if you do things like convolution in a small area.

Sarnath · February 12, 2008, 12:04pm

This is not an answer to the Ryan’s question. But in fact, this is my own question. I thought I would add my question in a relevant place than start a new thread:

Here is my noob question:

I have never used textures before. All I know is that they are cached and hence could be faster if used in the right way.

One of my kernels accesses an array in global memory as read-only. (the array was generated by a previous kernel). Each element of this array is accessed only once and there is no repetition. In such a case – Will there be any use in using a texture instead of a global memory? Is there a way to move these arrays to “constant” memory somehow by the host application?

Thanks for your time.

MisterAnderson42 · February 12, 2008, 1:20pm

Yes, there will be an improvement in data throughput using tex1Dfetch bound to global memory, as long as threads in a warp access nearby elements in the texture.

Sarnath · February 12, 2008, 2:03pm

But then, I access each element only once. Does not the texture cache function like CPU cache – in the sense – the first time the CPU access, it comes from main memory – only subsequent ones use the cache.

So, I thought if I access elements only once then there is no benefit from the cache. (unless one cache miss fetches a whole lot other neighbouring elements).

And, If you could clarify my second question on moving “global memory” to “constant mem”, it would be great!

AndreiB · February 12, 2008, 2:20pm

Try this for copying data from global memory space into constant memory space (const_name should be predefined and have appropriate size): cudaMemcpyToSymbol( “const_name”, gmem_addr, nBytes, 0, cudaMemcpyDeviceToDevice );

As for your first question: if all your reads are coalesced then you probably will gain nothing from using textures.

Sarnath · February 12, 2008, 2:23pm

I have never used constant memory and texture before. So, I am going to try tomorrow.

Thanks for all your replies.

Topic		Replies	Views
Texture and Global Memory CUDA Programming and Performance	2	3840	July 11, 2007
Copy from texture memory to shared memory Confused about best transfer strategy CUDA Programming and Performance	4	1554	February 11, 2010
Question about textures CUDA Programming and Performance	5	7834	May 9, 2008
When is it worth copying global to texture memory CUDA Programming and Performance	2	3360	July 7, 2008
Texture memory when to use ? CUDA Programming and Performance	6	20368	October 7, 2009
Is coalescing access important to texture memory? CUDA Programming and Performance	10	12818	March 16, 2008
For what case should I use texture memory? CUDA Programming and Performance	8	2653	May 26, 2010
Texture memory faster than global memory? texture memory vs coalesced memory access CUDA Programming and Performance	3	2007	May 18, 2009
texture memory vs global memory CUDA Programming and Performance	10	13690	August 20, 2007
CUDA texture memory performance CUDA Programming and Performance	4	33545	January 13, 2009

When to use textures

Related topics