Confusion on using texture?

CUDAkk · July 23, 2009, 8:01am

Hi All,

Is excessive use of texture in program causing low performance ?

Electro · July 23, 2009, 9:02am

Yes, that could happen.

Check the SDK’s sample “deviceQuery” it will tell you your maximum amount of constant memory (65536 Bytes for me)
I assume this value is the amount of cached memory. If more is used, i guess data will be tacken from global memory, which is really slower.

CUDAkk · July 23, 2009, 9:47am

I am talking about texture memory not constant memory .

Is any limitation on texture memory uses?

When I run SDK’s sample “deviceQuery” then I got following:

CUDA Device Query (Runtime API) version (CUDART static linking)

There is 1 device supporting CUDA

Device 0: “Quadro CX”

CUDA Capability Major revision number: 1

CUDA Capability Minor revision number: 3

Total amount of global memory: 1610285056 bytes

Number of multiprocessors: 24

Number of cores: 192

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 16384 bytes

Total number of registers available per block: 16384

Warp size: 32

Maximum number of threads per block: 512

Maximum sizes of each dimension of a block: 512 x 512 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 262144 bytes

Texture alignment: 256 bytes

Clock rate: 1.19 GHz

Concurrent copy and execution: Yes

Run time limit on kernels: Yes

Integrated: No

Support host page-locked memory mapping: Yes

Compute mode: Default (multiple host threads

can use this device simultaneously)

Test PASSED

Press ENTER to exit…

avidday · July 23, 2009, 10:43am

Texture memory is just global memory with some on GPU read cache. When usage patterns are not optimal for the cache, texture memory can be slower than global memory because you incur a cache miss penalty in addition to the normal global memory load latency.

CUDAkk · July 23, 2009, 10:52am

Could you explain bit more what is cache miss?

Nico · July 23, 2009, 10:59am

http://en.wikipedia.org/wiki/Cache

N.

MisterAnderson42 · July 23, 2009, 11:23am

But global memory is completely uncached…

In my experience, completely random reads are about 3 times faster from the texture cache than from straight global memory reads (yes, even on G200 hardware). With optimal access patterns from the texture you can hit the save device memory bandwidth limits that you do with global memory reads.

CUDAkk · July 23, 2009, 11:34am

Actually in my program I have to access 2400*1800 scattered texture memory read that takes a half of my program execution time. Since My read is scattered so I canot use shared memory.

Could you guide me here how to reduce scattered read?

MisterAnderson42 · July 23, 2009, 11:40am

My solution for scattered reads is to reorder the data in memory so that nearby threads are likely to read nearby values in the texture.

CUDAkk · July 23, 2009, 11:55am

I am also thinking same but one confusion is that if I made a extera kernel for reordering the memory than ,

Is it efficient to use large array of size 2400 * 1800 * sizeof (float) that is accessing in two different kernels?

laxsu19 · July 23, 2009, 12:04pm

I’m having similar problems ([topic=“102492”]http://forums.nvidia.com/index.php?showtopic=102492[/topic]),
and texture reads also seem slower, and seem to kill my program execution.

I feel like you in that I am merely using the texture to add some caching in case there is a chance for a near-hit, otherwise the index of the textured array I am accessing is purely random, by design of the code. It’s not exactly what CUDA is made for, I know, but it doesn’t mean I wont try…

CUDAkk · July 23, 2009, 4:49pm

Have you got any sucess?

laxsu19 · July 23, 2009, 5:14pm

Not thus far. I have 5 sites in the program where tex1dfetch can be called instead of a global memory linear array index read.

If I enable (remove comments from them) the first four of these, it works, with a consistent 0.1sec time penalty.

If i enable the last one, which is no different really than the others, the code will have the 5 second timeout error.

This even happens if the first four are commented out and JUST the last one is enabled. Weird huh?

byung · September 3, 2009, 3:53pm

MisterAnderson42,

I always appreciate your contribution to the community.

I would like to ask you that what is the optimal access pattern for texture memory, from your professional experience.

It would be appreciated to see simple loop nest that exhibits optimal memory access pattern for texture memory.

Thanks again,

CUDAkk · September 4, 2009, 4:31am

It is always suggested to use texture memory instead global memory or if you have scatter memory read that use texture insted texture instead of global memory.

If you have 1D texture then use tex1Dfetch() for reading data this gives better performance as compare to global memory read( see page 113 of CUDA programming guide 2.3).

Topic		Replies	Views
Texture vs. Global Memory CUDA Programming and Performance	4	2071	August 6, 2009
texture memory vs global memory CUDA Programming and Performance	10	13947	August 20, 2007
Convenience of 2D CUDA texture memory against global memory CUDA Programming and Performance	4	4404	January 21, 2013
CUDA texture memory performance CUDA Programming and Performance	4	33690	January 13, 2009
Texture and Global Memory CUDA Programming and Performance	2	3887	July 11, 2007
the worse performance using texture memory any ideas? CUDA Programming and Performance	4	1470	July 5, 2011
Texture vs Global memory which of this is faster? CUDA Programming and Performance	2	5537	August 18, 2011
When to use textures CUDA Programming and Performance	7	8240	February 12, 2008
what's the benefit of using texture memory in Fermi verus using global memory CUDA Programming and Performance	12	2895	August 9, 2010
Basic Texture Question CUDA Programming and Performance	0	579	December 26, 2010

Confusion on using texture?

Related topics