Big numbers multiplication

Hi all,

I try to implement a simple multiplication algorithm on big numbers with Cuda, but I have a weird problem during the execution.

This is my code :

__device__ void Multiplication(mot *x, const DWORD longueurx0,

                                                 mot *y, const DWORD longueury0,

                                                 mot *z, mot *Carry)

{

     short  i,j;

     short  Deb,Fin;

     doublemot  AddHi,Temp;

     mot AddLo;

     short  longueurx, longueury;

    longueurx = (short) longueurx0;

     longueury = (short) longueury0;

     AddLo=0;

     AddHi=0;

    for (j=0;j<(longueurx+longueury-1);j++)

     { Â 

         Fin = (short) MIN(longueurx-1,j);

         Deb = (short) MAX(0,j-longueury+1);

        for (i=Deb;i<=Fin;i++)

         {

            Temp = (doublemot)(x[i])*(doublemot)(y[j-i])+(doublemot)AddLo;

            AddLo = (mot)(Temp);

            AddHi += (Temp>>LG_MOT);

         } Â 

         Carry[j+1] += (mot)(AddHi);

         Carry[j+2] += (mot)(AddHi>>LG_MOT);

         z[j] = AddLo;

         AddLo = 0;

         AddHi = 0;

     }

    Addition(z, longueurx+longueury, Carry, longueurx+longueury, z); 

}

For information, “mot” type is unsigned int and “doublemot” type is unsigned long long int.

When I run that, the “z” value is always good before the final Addition, but the “Carry” value is never the good one, and moreover that value is always different if I run it more than once.

I tried the __syncthreads(); function but as I don’t have any shared attributes for the moment, I think it’s unusefull.

In a first time, I just want to have a good result for that multiplication and then I’ll try to see what to do with loops which include “threadIdx.x” and “blockIdx.x” (I just begin with Cuda :) ).

Thanks for your help !

Laurent.

Nobody have a little idea on that problem ?

It seems to be the “+=” on Carry[j+…] which don’t work exactly as I hoped…

Threads can be executed in any order, and you have multiple threads reading/writing to the same memory location. Thus there are race conditions in your code.

I’m agree with that but I wondered why the problem always appears with “Carry” and never with “z”…