global memory coalescing data accessing problem

Hi everybody. :biggrin:
Please give me your idea.
I have a problem in the global data access. :blush:
Input of equation is 8bits data, but Output of this equation is 32 bits.

Output += input1 * input2;
If I use “struct align(4) uchar4 {uchar u0, uchar u1, uchar u2, uchar u3 }”
for input1, input2.

Uchar4 input1, input2;
Ouput += input1.u0 *input2.u0;
Output += input1.u1 * input2.u1;
Output += input1.u2 * input2.u2;
Output += input1.u3 * input2.u3;

How can I solve this problem with coherence data access in global memory?