Hello everyone
There is a trouble for me.
I write a code in cuda in whcih the float atomic add will take almost half of the running time.
My device is 9800gtx, and there is no atomic add for integer, so I can’t use the built-in atomic add.
I release a float atomic add by myself.
There is the code
device void atomicaddfloat(float *pa,int *atomicadd,float &b)
{
bool leaveloop=true;
do{
if(atomicAdd(atomicadd,1)==0)
{
leaveloop=false;
*pa = *pa+b;
*atomicadd=0;
}
}while(leaveloop);
}
Using a secondary integer array(atomicadd), the function works.
But the efficiency is not satisficing for me.
Is there any body can help me?