I would like to allocate an array of, say, 14 x 13 floating point values and bind a texture to it.
If I use cudaMallocPitch to allocate the array, I can bind a texture to the array using cudaBindTexture2D and all operations on the texture work just fine (e.g., interpolation)
If I use cudaMalloc (that is the width of the array is not pitched to 64 bytes), I can still bind a texture using cudaBindTexture2D but most operations on the texture do not provide the correct result.
Textures impose certain alignment restrictions on the underlying memory, which frequently also requires padding for each row. Using either cudaArray, or using pitch-linear memory allocated with cudaMallocPicth() guarantees that all requirements are met. You can get things to work with cudaMalloc(), but have to take care of alignment and padding yourself. I am not sure all requirements are publicly documented, so I would not recommend going down that path.
What about copying linear data to a cudaArray or 2d/3d allocated memory, and using that for texture binding?
I’ve been able to copy 1d data to a 3D/2DLayered/2D/1DLayered cudaArray without any issues as far as I can tell. Could this be problematic for certain data types?