Thread cooperative addition

If each thread in a block have one float variable each, how do I in an efficient way make a sum of the variables?

This is called a parallel reduction - take a look at the reduction sample in the SDK.