Benefits of Texture Memory couldnt use them...

sicb0161 · February 11, 2008, 6:21pm

Hi,

I was experimenting with device memory reads through texture fetching. In the programing guide (section 5.4) one of the benefits is that “they are not subject to the constraints on memory acces patterns that global or constant memory reads in order to get good peformance”. Well I was checking the cuda file provided by MisterAnderson (Nvidia Topic) which provides a bandwidth check. Well I was changing the access pattern from
const unsigned int idx = threadIdx.x + blockIdx.x * blockDim.x;
to
const unsigned int idx = threadIdx.x * blockDim.x + blockIdx.x ;
hoping that the access is almost as quite fast as it is promised in the programing guide. The bandwidth droped down from read-only-texture 30 GiB/s to 8 GiB/s (GTS 8800). Similar behavior for the other datatypes and reading types.

Did I misunderstand something in the programing guide ? Why are device memory reads through texture fetching behaving similar to normal device memory reads if they are not subject to any constraints ?

Cem

DenisR · February 11, 2008, 7:24pm

You have not the contraints of memory banks and coalescing, but if you are going to be reading N values that are not close together (and don’t repeat) it is going to take the same time as global memory.

A texture fetch has a localized cache, so getting random values that are close together, you will benefit from the cache.

sicb0161 · February 12, 2008, 8:39am

okay, I understand. But if someone wants to apply some filters on an image while the image is saved lets say column-wise. Then I have to apply the filter to areas which are not close together at all. How does cuda handle that?

thanks for the reply.

VanDammage · February 12, 2008, 9:30am

For image filters i usually use 2D textures which are pretty fast when 2D locality is given. And filters use the same data for different calculations so you can also profit from the texture cache.
You have to copy the image data to a cudaArray before, to be able to use 2D textures though.

sicb0161 · February 12, 2008, 9:46am

Hmmmm, okay.I was thinking to use texture memory for linear algebra routines, such as vector matrix multiplication. But I think that 2D locality is not given. So caching does not really help improving the performance.

thx for the replies

MisterAnderson42 · February 12, 2008, 1:25pm

Think of the cache as a way to read “almost coalesced” values. If all 32 threads in a warp read values close to each other in the texture (where close can be 1D or 2D depending on the texture type), then you will achieve maximum throughput as if you had a coalesced read. For particular linear algebra routines, this might be convenient if coalescing is difficult.

Sarnath · February 13, 2008, 9:56am

Well, the two expressions are totally different.

Consider I am launching 5 blocks with 512 threads. WIth each thread correspodning to one element. the total number of elements would be 5*512

The first case of “idx” calculating is straight-forward. I dont need to explain anything there.

The second case: Consider the 511th thread with blockIdx.x as 4. THe expression would evaluate to “511*512 + 4”. This is totally menaningless.

Thats why probably you are seeing strange bandwidth.

Topic		Replies	Views
Textures CUDA Programming and Performance	2	1640	July 22, 2008
I am trying to compare the performance of texture fetch and usual memory fetch CUDA Programming and Performance	10	2262	July 19, 2010
Memory performance in image processing example CUDA Programming and Performance	9	1607	March 24, 2011
When to use textures CUDA Programming and Performance	7	8126	February 12, 2008
Texture access... Fetch size, Cache size & performance CUDA Programming and Performance	3	2922	December 29, 2009
Understanding GPU caches can't get my head around it CUDA Programming and Performance	12	4833	March 14, 2009
For what case should I use texture memory? CUDA Programming and Performance	8	2655	May 26, 2010
Texture vs Global Memory Bandwidth CUDA Programming and Performance	5	6560	March 25, 2010
Question about texture/shared memory enhance the computing efficiency CUDA Programming and Performance	3	5381	December 4, 2007
texture memory vs global memory CUDA Programming and Performance	10	13730	August 20, 2007

Benefits of Texture Memory couldnt use them...

Related topics