Bitmap representation for simple mask filtering

I have bitmap and 3x3 mask in float array. Single thread computes one pixel of result image.

I need bitmap representation to get the best memory bandwidth. Should I keep one pixel in integer or float? It is 1 byte redundancy for 24-bit bitmap, but it is good for coalesced memeory access. And what about images which width are not multiplie of block width? Or should I use textures?

Could you help me?