I’ve started CUDA from 3 weeks and I’ve speed-up my algorithm by 100. So, it’s cool :)
However, I’ve read the “programming guide” and there are many things pretty hard to understand to me.
I learn something everyday, and today, I’d like to understand what is a texture? I mean I know what is a texture in compter graphics, but I think that it is a specific “structure” or something like that with interesting properties in CUDA.
Can you teach me please?
Thanks a lot,
In CUDA texture is a way to read-only access some portion in global memory with less penalties for non-coalesced reads.
There is an interesting example on coalescence with the matrix mutliplication example. The basic approach don’t read coaslesced memory. So, the authors propose another approach to fix this problem and speed-up the entire process. Do you think that the basic approach using texture can be faster than the other approach, more complicated but doing coalesced read and write?
I don’t know what do you mean by ‘basic’ aproach and ‘another’ approach. What I know is that if your memory reads and/or writes are coalesced then you won’t benefit from using textures. If they’re not then you definitely should try textures.
The matrix multiplication is a very simple algorithm. Actually, it can be written is a kernel with maybe 5 rows. The problem is that the reads are not coalesced. This is the basic approach.
NVIDIA proposes another approach using shared memory to manage coalescence problems. This is the improved approach.
I was wondering if that it would be more interesting to use the basic approach with texture than using shared memory with a little more complex algorithm.
To be short, what is the best :
Simple algorithm + texture VS “complex” algorithm + global + shared memory
Maybe there is no good answer to this question. I just try to understand :)
I think the rule of thumb is that perfectly coalesced usage of global memory using perfectly coalesced shared memory is best. Shared memory bandwidth is hundreds of gigabytes per second: http://forums.nvidia.com/index.php?showtop…027&mode=linear
Texture memory is next-best because it’s automatically cached, in that when you read from a certain texture position, the region around it is cached too. It should still be slower than shared mem, but I’m not sure. You’ll have to search a bit to get complete information, but this is my current picture of things.