 # Factor "RSA like" 100 digits number

We had a assignment in math where we used Maple to learn a tiny bit about algorithm and encryption and one of the bonus goal was to decrypt another’s student message by cracking his n, n being the product of two ~50 digits prime number, p and q.
Maple being poorly optimize and working in serial was nowhere close to be able to factor the 100 digits number in over 18h of computing.

Since I know that parallel computing is a better approach and that I have a GTX570, I thought I could kill 2 bird with one stone and learn a bit of CUDA and finding the Teacher n. (If you find this one you can find the n of every other students)

I’ll like to know how to translate this algorithm in CUDA and parallelize it or if there would be a better one as that one was the best I could do with the program I had.

p= 50 digits prime (not know)
q= 50 digits prime (not know)
n= p*q (know)

p= squareroot(n) lower to last odd Interger
q= n/p

while q is not an integer
p= p-2
q= n/q
return/print p and q

I have a bit of experience with Java(Netbeans) not C/C++ unfortunately.

Also would anyone have an estimate on how long it would take for one GTX570/i5-2500k?

Let’s assume we represent the operands as multiple of 32-bit words (since the GPU has a 32-bit architecture). This means for the trial division we are looking at a 352-bit dividend. Based on the number of instructions used by CUDA’s 64-bit division (around 70), it seems that a 352-bit long-integer division could be accomplished with about 2000 instructions, about half of which would be integer multiplies or multiply-adds. The instruction throughput of a GTX570 for this instruction mix may be around 200e9 instructions per second. So we would be able to perform on the order of 1e8 trial divisions per second. Quickly eliminating multiples of 2, 3, and 5 as possible candidates for p leaves about 1e49 divisors to examine, so it would take 1e41 seconds to scan the entire divisor search space. How much calendar time this represents is left as an exercise for the reader :-)

Note that the above is an example of a back-of-the-envelope calculation using very rough estimates for all quantities involved.

Do you really need to perform a full division each time? Proceeding from one divisor to the next would just need an addition to apply the delta, and a few subtractions to test possible results.

Any further optimizations I’ll leave to experts in the field of cryptography.

I have no hands-on experience with factorization. Wikipedia provides a quick overview that could serve as the starting point for a literature search: http://en.wikipedia.org/wiki/Integer_factorization

Just 3.17e41 years, still a lot faster than what I had.
As we already know that the number has only 2 factor of ~50 digits, I only need to search between 1e44 and n½, I don’t need to cheek 2, 3, 5…

p= 50 digits prime (not know)
q= 50 digits prime (not know)
n= p*q (know)

p= squareroot(n) lower to last odd Interger
q= n/p

while q =/= 0
{
p= p-2
q= n%q
}
q=n/q
return/print p and q

I did not mean to suggest trying division by 2, 3, or 5. I was suggesting that instead of iterating through all odd numbers (thus eliminating multiples of 2), use a slightly more complicated pattern of decrements to skip over multiples of 2, 3, and 5, which should reduce the number of candidates by about a factor of 10. To first order, there are 1e50 50-digit numbers, so this gets us down to 1e49 candidates.

If I can identify the digits of a number easily than yes it would probably speed this up a lot.

if (lastdigit == 5)
{}
else if (sumdigits%3 == 0)
{}
else
q = n%p

And concerning your previous post, can I use CUDA like netbeans or is it a lot more complicated?
Because I guess the IDE isn’t going to transform it in parallel for me.

This is going to be a very hard first project with CUDA, especially if you don’t know C or C++. (NetBeans is an IDE and CUDA C/C++ is a programming language, so I have no idea how one would compare those two things.)

The CUDA compiler does not automatically parallelize code for you. You have to write special code to run on the GPU and code to run the CPU that launches the GPU code and collects the results.

I’m also confused as to what the ultimate goal is here. Are you trying to learn CUDA or factor a 100 digit number?

The main problem with your original approach is not the serial vs. parallel question, but the use of a very slow algorithm. If you want to factor a 100 digit number in a reasonable time, the state of the art is to use the one of the number field sieve algorithms. According to this page:

A 100 digit number can be factored in a matter of hours with a single CPU. NFS is not very easy, but I think is one of the few tractable approaches to factoring a number that large.

I say CUDA because I don’t know the name of the IDE, I know it’s not the same thing but sorry for confusion.

Found a program on the wiki page and this may do the job, I’ll give it at less 3 hours. (And it seems it might do it)