I am trying to speed up my cuda program by reducing the amount of time spent in accessing memory from my kernels. Binding linear memory to a texture object allowed me to have a radical improvement over using global memory.
My question is: could I get better results (aka timings) if I bound the texture to a CUDA array instead of using linear memory? Or maybe Arrays are only used to have access to filtering functions and to more elaborate indexing mode in fetching?