Hi all,

I try to implement a simple multiplication algorithm on big numbers with Cuda, but I have a weird problem during the execution.

This is my code :

```
__device__ void Multiplication(mot *x, const DWORD longueurx0,
mot *y, const DWORD longueury0,
mot *z, mot *Carry)
{
short i,j;
short Deb,Fin;
doublemot Â AddHi,Temp;
mot AddLo;
short Â longueurx, longueury;
longueurx = (short) longueurx0;
longueury = (short) longueury0;
AddLo=0;
AddHi=0;
for (j=0;j<(longueurx+longueury-1);j++)
{ Â
Fin = (short) MIN(longueurx-1,j);
Deb = (short) MAX(0,j-longueury+1);
for (i=Deb;i<=Fin;i++)
{
Temp = (doublemot)(x[i])*(doublemot)(y[j-i])+(doublemot)AddLo;
AddLo = (mot)(Temp);
AddHi += (Temp>>LG_MOT);
} Â
Carry[j+1] += (mot)(AddHi);
Carry[j+2] += (mot)(AddHi>>LG_MOT);
z[j] = AddLo;
AddLo = 0;
AddHi = 0;
}
Addition(z, longueurx+longueury, Carry, longueurx+longueury, z);
}
```

For information, “mot” type is unsigned int and “doublemot” type is unsigned long long int.

When I run that, the “z” value is always good before the final Addition, but the “Carry” value is never the good one, and moreover that value is always different if I run it more than once.

I tried the __syncthreads(); function but as I don’t have any **shared** attributes for the moment, I think it’s unusefull.

In a first time, I just want to have a good result for that multiplication and then I’ll try to see what to do with loops which include “threadIdx.x” and “blockIdx.x” (I just begin with Cuda :) ).

Thanks for your help !

Laurent.