I am writing a kernel that must read one matrix (stored by rows) in each call. I have allocated my matrix with MemAllocPitch and made each thread read one element so that memory reads are coalesced. The question is: In this case, is it worth it to try to read the matrix with a 1D texture fetch? What advantages does texture fetching have over coalesced memory reads?
Texture fetching has no advantage over coalesced memory reads, except when reading 128-bit types. Texture fetching is primarily useful to perform “almost coalesced” memory reads, as the cache will improve performance vs an uncoalesced read.