why is vec4 local variable dumped to local memory?

I’m trying to load float4 and uchar4 from pitched global memory into a register but the register is dumped into local memory for some reason.
This version is dumped into local memory:

float4 val = *((float4 *)((char )in + yinStride) + x);
uchar4 val = *((uchar4 *)((char )in + yinStride) + x);

(i.e val is in local memory instead of register memory)
while this is ok:

float4 val = (in + yinStride/sizeof(float4) + x);
uchar4 val = (in + yinStride/sizeof(uchar4) + x);

I’m guessing that the compiler decides that the first version may access misaligned memory and for some reason dumps the register into local memory. Is that true?
If so, does the documentation say this somewhere and is there anything to do for it that is more correct than assuming 256byte memory alignment?