Bignum values possible on cuda devices?

I’m considering implementing a factoring method (known as Quadratic Sieve) in parallel with cuda for my cryptology project. The method would factor numbers several hundred digits long. I’m looking into libraries like GNU MP Bignum (GMP) and LiDIA to implement these large numbers. Does anyone know if I would be able to pass these large values over to a cuda device for computations?

Thanks in advance!

Not trivially, as the use custom, mostly assembler based routines to implement the operations for these big numbers. I personally have a CUDA or OpenCL accellerated bignum library on my private TODO list, but it’s low priority and constantly gets swapped out … . So, you sure can move the data to GPU, but you would have to implement all operations yourself.

Interesting thing is, that e.g. multiplication for really large numbers can be done in fourier space, so as GPUs are good at FFT …

Thanks for the quick response. How difficult would it be to implement such a library? Would it have to be in assembly language or some tricked out C code? I’ve never done anything that low level before.

You should be able to do all of that in C/C++. The tricky question is at what level to employ the parallelism. Have one block operate on one set of numbers? I doubt numbers become that large that using the whole device for one operation pays off.

I see the dilemma. You would have to launch a new device kernel every time you wanted to do arithmetic.

Do you mean use the quadratic sieve to factor a number hundreds of digits long, or to factor hundreds of numbers each a few digits long? The latter is an interesting problem and the focus of a lot of current research (usually the interest is in ECM factoring and not QS), the former is essentially impossible.

Thanks for the reply jasonp. For our project we have to implement some cryptology method. I told my professor I’d like to do something in parallel with cuda and she recommended the quadratic sieve. I may have misinterpreted her, but I believe she said it requires very large numbers, several hundred digits long. But if its the other way around (hundreds of numbers each a few digits long) then implementing the quadratic sieve should be feasible. I’ll verify

Are you saying that implementing ECM is impossible? I looked it up and it appears someone has done it to some extent.

That seems like a fun project to take on as well :)

No, just the opposite; most of the intermediate quantities used in the quadratic sieve are about the size of the number you want to factor, but most of the runtime in a good QS code should be spent sieving and not performing multiple precision arithmetic. However, if the number to be factored is hundreds of digits long, all the GPUs in the world won’t help you get QS to complete the factorization. For practical purposes QS tops out factoring numbers around 90 digits in size before it’s better to switch to the number field sieve.

Bear in mind that sieve methods rely on low latency random access to large arrays of memory, but GPUs provide huge bandwidth and not low latency.

ECM on numbers hundreds of digits long is both easy and trivial to parallelize; see Dan Bernstein’s work on ECM using CUDA. If you’re new to both factoring and CUDA, I’d suggest starting there. You still need some multiple-precision arithmetic, and all the runtime will go into it, but running different elliptic curves in parallel is easy. The downside is that you probably will never find factors over about 60 digits in size, which means you’ll never get to break a decent size RSA key.