If each thread in a block have one float variable each, how do I in an efficient way make a sum of the variables?
This is called a parallel reduction - take a look at the reduction sample in the SDK.
If each thread in a block have one float variable each, how do I in an efficient way make a sum of the variables?
This is called a parallel reduction - take a look at the reduction sample in the SDK.