Texture vs. Global Memory

If that is truly your memory access pattern for your real application, constant memory will serve you much better. It is optimized for cases where all threads in a warp read from exactly the same element.

If your real dataset is bigger than 65k then doing periodic coalesced reads into shared memory and then having the inner loops read from that shared memory is advised.

Textures are actually most useful when you are unable to coalesce a warp-wide memory read due to a semi-random read pattern (spatial locality among thread reads). The texture cache is not large enough to allow any performance gain from temporal locality.