char4 vs Texture

Hello all!

I have a kernel that is simply iterating linearly through a big char array that is bound to a texture reference. The kernel computes a hash function, using every byte of the array, in a linear fashion (i.e. a[0] + a[1] + a[2] + … ). The strange thing is that I get worst performance if I use a char4 type, in contrast with a char type. By using the char4 type, I use the .x, .y, .z, .w fields.

Is the texture cache, by preserving locality in memory references, achieving better performance rather than the char4? Has anyone seen something similar?