I believe when binding memory using cudaBindTexture2D, there are certain alignment constraints, which isn’t entirely unexpected, but they are not specified in the reference manual (or anywhere else that I can find). It seems that rows must be a multiple of 32 bytes for getting correct results. Using cudaMallocPitch will always result in correctly aligned memory, BUT this has some problems since, for example, CUFFT’s routines do not take a pitch argument as input, so memory allocated in this way can’t be used with CUFFT.
This is not alignment relating to the offset parameter in the cudaBindTexture2D function, which is always returned as 0, but related to the width of individual rows.
Attached is a code which should illustrate the problem.
Perhaps the reference manual should be updated to reflect this requirement. And in the longer term it would be nice if CUFFT could handle memory allocated with a pitch != width.
textureTest.cu (3.15 KB)