I have global function (F) in which there is a reduction function on an array(A), and after a global Reduction , I do some other operations which need the result of the reduction.
global void F(type A)
{
type redresult;
redresult = globalReduction (A);
////here need all threads to wait the redresult being ready////////////////
type alpha;
alpha = redresult + …; //some operation needing result
}
My question is how can I make all threads wait for redresult to be ready? is threadfence() useful here?
Thanks.