Confusion on using texture?

Hi All,

Is excessive use of texture in program causing low performance ?

Yes, that could happen.

Check the SDK’s sample “deviceQuery” it will tell you your maximum amount of constant memory (65536 Bytes for me)
I assume this value is the amount of cached memory. If more is used, i guess data will be tacken from global memory, which is really slower.

I am talking about texture memory not constant memory .

Is any limitation on texture memory uses?

When I run SDK’s sample “deviceQuery” then I got following:

Texture memory is just global memory with some on GPU read cache. When usage patterns are not optimal for the cache, texture memory can be slower than global memory because you incur a cache miss penalty in addition to the normal global memory load latency.

Could you explain bit more what is cache miss?

http://en.wikipedia.org/wiki/Cache

N.

But global memory is completely uncached…

In my experience, completely random reads are about 3 times faster from the texture cache than from straight global memory reads (yes, even on G200 hardware). With optimal access patterns from the texture you can hit the save device memory bandwidth limits that you do with global memory reads.

Actually in my program I have to access 2400*1800 scattered texture memory read that takes a half of my program execution time. Since My read is scattered so I canot use shared memory.

Could you guide me here how to reduce scattered read?

My solution for scattered reads is to reorder the data in memory so that nearby threads are likely to read nearby values in the texture.

I am also thinking same but one confusion is that if I made a extera kernel for reordering the memory than ,

Is it efficient to use large array of size 2400 * 1800 * sizeof (float) that is accessing in two different kernels?

I’m having similar problems ([topic=“102492”]http://forums.nvidia.com/index.php?showtopic=102492[/topic]),
and texture reads also seem slower, and seem to kill my program execution.

I feel like you in that I am merely using the texture to add some caching in case there is a chance for a near-hit, otherwise the index of the textured array I am accessing is purely random, by design of the code. It’s not exactly what CUDA is made for, I know, but it doesn’t mean I wont try…

Have you got any sucess?

Not thus far. I have 5 sites in the program where tex1dfetch can be called instead of a global memory linear array index read.

If I enable (remove comments from them) the first four of these, it works, with a consistent 0.1sec time penalty.

If i enable the last one, which is no different really than the others, the code will have the 5 second timeout error.

This even happens if the first four are commented out and JUST the last one is enabled. Weird huh?

MisterAnderson42,

I always appreciate your contribution to the community.

I would like to ask you that what is the optimal access pattern for texture memory, from your professional experience.

It would be appreciated to see simple loop nest that exhibits optimal memory access pattern for texture memory.

Thanks again,

It is always suggested to use texture memory instead global memory or if you have scatter memory read that use texture insted texture instead of global memory.

If you have 1D texture then use tex1Dfetch() for reading data this gives better performance as compare to global memory read( see page 113 of CUDA programming guide 2.3).