Generally speaking, no, textures are not faster than global memory if your access pattern is coalesced. Textures are faster if you need to read elements in an uncoalesced, but spatially local, way. For example, if your thread is going to loop over a small square region inside a larger array, a texture can help. Textures won’t help you if you are accessing memory with a large stride, like looping through a big row-major array in column-major order.
That said, using a texture is a standard workaround for reading float4 arrays. For some reason, coalesced reads from float4 arrays achieve a little more than half the memory bandwidth of coalesced reads from float and float2 arrays. But if you read your 1D float4 array through a texture reference, then you can get full performance again.
Textures references can be bound to linear memory (that is, global memory that you allocated with cudaMalloc), or cudaArrays. You can write to linear memory from a kernel, although texture cache coherency is not preserved within a single kernel call. (i.e., if the address you wrote to was already in the cache, and you read the texture again, you’ll get the old value) Since you are just writing to global memory, it’s the same speed as any other global memory write.
You cannot write to cudaArrays (which are required for 2D or 3D textures) because they pack data into memory in some special format, which NVIDIA doesn’t document.
simpleTexture shows the basic texture usage, and Appendix D.3 shows how to index the array for table lookup. Unfortunately, you’ll still have to fiddle a bit to put the two together. I haven’t found any other simple examples of using textures like an array.
Hi guys, thanks for the advice. From the comments it seems that reading info from a texture is generally no different than using global memory. So what’s the point of it then? Thanks, I appreciate all of your help.