is it possible to implement big integer efficiently ?

I am trying estimate if an proejct is feasible in CUDA. This is to implement quadratic sieve (QS).

quadratic sieve is an integer factoring algorithm. The integer to be factored exceeds the size of long long type.

so I must make a big integer library.

So far as I know, the add and multiply function is simple to implement with the help of __syncthreads(); An easy way

is that each thread only handle an element of the array.

However, I am stuck in implementing the division and modulo function. I can’t really think of an good idea to make the

functions because division can only be done sequential.

If you have a better idea to do so, please tell me, thanks.

GPUMP is one project to do this, from Kaiyong Zhao who is on this forum.
I can’t find a link to an actual project page, though, just the paper.

GPUMP is one project to do this, from Kaiyong Zhao who is on this forum.
I can’t find a link to an actual project page, though, just the paper.

Another option is to keep the multi-precision operations within each thread and extract some other parallelism from the problem on hand.

Another option is to keep the multi-precision operations within each thread and extract some other parallelism from the problem on hand.

Ok, thanks for your information, I will look into it.

Ok, thanks for your information, I will look into it.