How to load 5 floats efficiently?

I need to load and store 5 floats by a single thread. Only selected threads in a block need to perform the operation. Should I group the 5 floats into one float4 and one float, or is there a more efficient way to do it?

Never mind. I have been able to reduce the number of floats needed to be loaded and stored from 5 to 4.