Coalescence on Fermi an issue ?

I started writing code for Fermi cards, which doesn’t have to be compatible with compute abilities 1.xx.
I am wondering whether I should worry too much about misalingned memory access, looks like a pain in the a**.
Any experience with performance loss on a GTX 480 when not bothering with coalesence ?
Thanks in advance.

There will be a performance hit - just like there is on a CPU, when you load a cache line, but only use a few bytes of it. However, the penalty is much less than it was on the G80 hardware (I saw a 3x change in a ‘real’ code just by swapping blockIdx.x and blockIdx.y, once).

There will be a performance hit - just like there is on a CPU, when you load a cache line, but only use a few bytes of it. However, the penalty is much less than it was on the G80 hardware (I saw a 3x change in a ‘real’ code just by swapping blockIdx.x and blockIdx.y, once).