Running Miller-Rabin test to find big primes

I am in a Crypto Class where we were given a challenge for who could generate the biggest prime using MR test. Someone got the record of 30000 digits in 24 hours then currently got the record with 1 million digits, he said he has access to a computer with some crazy power so there is no way anyone can beat him unless taking another approach. So I started looking into CUDA programming(complete beginner to this, but confident with C) I found the starter CUDA program(adding some arrays) and found that pretty simple to follow and I think I have a decent basic understanding of CUDA.

My current program uses GMP for arbitrary precision numbers
The pseudo code looks like:

Iterate i over all even numbers between N and 2N

    Some integer math
    while (some condition)
        Some integer math
    for some range(i use 50)
        test(a b)

test(a, b)
    Some integer math
    while(some condition)
        Some integer math
        if (cond)
            return true
        return false

I’ve got a couple questions:
Ovbiously GMP or MPIR wont work for CUDA, so I found by Nvidia

  1. I saw someone say it works up to 2 million digits, which technically will work for beating the current record but it doesn’t leave me a lot of room. But from reading the docs it says “the size of each integer in bits is referred to as its precision” and when creating ints(I think only available by creating an array of ints) xmpIntegersCreate() takes an argument for prescision as uint32_t. So the way I understand this is the max int size I could have is 2^4294967295.

  2. I don’t see any way of printing these, and they wont be usable in host code(I think) so how could I view the result when I find one?

  3. I guess any other advice?

Some numbers may be more likely prime than other (randomly chosen) odd numbers, e.g. Mersenne Primes

Alternatively consider testing numbers that are primorial primes.
Choose a fairly large primorial. Again, such numbers are more likely to be prime than any randomly chosen odd number (the reason being that the primorial acts like a sieve that guarantees your selected number does not contain any factors that are part of the the primorial)

I would actually suggest to extend this search to integer multiples of the primorial ± 1 as well because that gives you a nicely large search space over N, for any given primorial p#

N * p# ± 1

Additionally before testing primality, you could apply some sieve of Eratosthenes (for N=1… assuming a constant p#) up to a few million primes > p. The reasoning being that eliminating composite numbers by sieving will be much faster than performing a primality test on all (those eliminated) bigint numbers.

For the prime candidates that have passed the sieve I’d start with a base 2 Fermat test using the xmp library, then only run a more thorough primality test on the CPU (with gmp) when the Fermat test has indicated a possible prime.

Concerning the current leading prime with 1 million digits: it might be worthwhile to check (with a search engine) if this prime has been published somewhere else - if so, then it might be regarded as cheating - and the solution would be disqualified.

Thanks for the reply!

I will look into implementing both just searching the more probable primes and the sieve of Eratosthenes, a brief read of the topics looks like it could cut down on my search space quite significantly.

Why do you suggest only fermat on GPU then do the full test on CPU?

I am pretty certain the current record holder is not cheating, I have heard he is a pretty smart student and he shared he feels kind of like hes cheating because he has access to such a crazy powerful machine.

I actually finished my initial build of this program which takes a number N as an agument which will be used to calculate the starting number: (2^N)-1. The program then iterates over all odd numbers from [(2^N)-1, ((2^N)/2)-1] (highest to lowest). I also found there is an xmp 2.0 beta which looks like what I should be using(updated this year VS 7 years ago). I tested on a smaller number and it worked fine. I have it stop as soon as it finds one. Last night I began it running with N=31000 and when I woke up(about 7 hours later) it was still running.

I have the GPU running the tests on 1 000 000 or 100 000 numbers at a time(can’t remember)
The looping part of my program looks like:

populate the 1 000 000 xmp numbers by starting a gmp number at the high number of the interval then copying the gmp number to xmp
then subtract 2

copy the xmp numbers to gpu memory 

run kernel to test numbers
copy mem back and iterate over all numbers to see if it passed

if more numbers to search, look at next 1 000 000 number interval

I based my kernel function off sample 4 from the xmp2.0 samples:

While at school I was thinking of how to optimize this more, I looked at the gpu utilization and roughly it was at 0 then would go high, so I had the idea to have multiple CPU threads populating the 1 000 000 xmp numbers and multiple threads for looping over the numbers after the kernel has ran for checking if there was a passed test, that way the GPU is always running 100%

BUT there is one thing that I didn’t notice until I finished my program. The XMP 2.0 Library has a maximum of 32 000 bits so the maximum number I can test is 2^32000 which is only 9633 digits. This wouldn’t even beat his initial record of 30 000ish digits. I havn’t looked into increasing this limit yet, but I hope I can do so

The reason I am suggesting to only do a base 2 fermat test on GPU is that at those long digit numbers the likelihood of passing this test, and the number being composite is exceedingly small.

Hence it’s probably not worth spending the time and effort to implement a 100% reliable primality test on the GPU.

The CPU can handle that just fine, e.g. by calling the mpz_probab_prime_p() function.

Have you checked what binary digit limit the older xmp 1.0 library has? With some changes to the Makefile it is possible to make it reliably run on Pascal, Turing, Volta architectures as well. One might also have to hack the code that detects the GPU architecture based on compute capability, because I think it only detects up to Maxwell architectures and may fall back to Fermi (?) code for anything newer.


I don’t see any explicitly mentioned restrictions on the length of numbers (count argument to APIs) for xmp 1.0

A drawback of the xml 1.0 API is that it can only be called from host code.

As an exercise to the reader I’d propose to feed it some known Mersenne primes of various sizes for a fermat test verification: