I find that textures are not explained very well in the Programming Guide/Reference Manual.
A Texture Reference must be declared at file scope:
texture<Type, Dim, ReadMode> texRef;
eg. texture<Float, 2, cudaReadModeElementType> tex1;
declares a texture reference called tex1 which is 2D, returns floats, and is accessed using non-normalised coordinates.
We will assume that we want to use the texture to access a Cuda Array, so in main() we must declare:
Now we need a Channel Format Descriptor
cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc(32, 0, 0, 0, cudaChannelFormatKindFloat);
I am not very clear why this is necessary, and what it does. I also do not know what int x, int y, int z, and int w refer to. I think they are something like the number of bits in each dimension returned when the texture is accessed, but then it doesn’t make sense why in the examples I have seen, even when a 2D texture is being used, only ‘x’ is non-zero.
We need to allocate memory on the GPU our Cuda Array now:
cudaMallocArray( &cuA, &channelDesc, 3sizeof(float), 1sizeof(float) );
I believe this should allocate a 3 x 1 array of floats
We need to put something into our Cuda Array:
cudaMemcpyToArray( cuA, 0, 0, h_data, 3*sizeof(float), cudaMemcpyHostToDevice);
This copies 3 floats from the h_data array to our Cuda array ‘cuA’
We need to bind our texture (tex1) to the Cuda Array, so that we can access the Cuda Array through the texture:
cudaBindTextureToArray(tex1, cuA, channelDesc);
We can now access our Cuda Array through the texture (which will hopefully improve performance due to caching) like this:
*(dat+ threadIdx.x)=tex2D(tex1, threadIdx.x, 0);
(where ‘dat’ is a pointer to some memory allocated on the device for floats)
This accesses cuA[threadIdx.x].
I believe it is not possible to alter textures from the kernel, so this implies that binding the memory (in our case the Cuda Array cuA) to a texture makes a copy of the memory, so if we made a change to cuA it would not alter the value returned by tex2D unless the texture is re-bound to cuA.
I believe that one does not have to worry about coalescing memory reads when reading a texture.
I have read that texture reads are fastest when one is accessing cells of the array that are close together (‘2D spatial locality’), which I presume has something to do with how the cache works. I understand this to mean that if I am accessing [a][b] and [a+1][b], say, I will probably get better speed that if I accessed [a+54][b] instead.
The Texture Cache is 16KB per processor. I assume that this is a different 16KB to the Shared Memory, so using the Texture Cache does not reduce available Shared Memory, so it is a good idea to use Textures for memory that does not need to be modified within a kernel to leave more Shared Memory for memory that does need to be modified, while possibly reducing accesses to global memory.