Does anyone know information about anyone who has successfully implemented integer prime factorization on the GPU (with performance improvements over CPU)?

CUDA 1.0 has 64 bit integer support (from the changelog), but I’m talking about 1024+ bit integers. Arithmetic on 1024 bit integers can probably be emulated using 64 bit integers, but there probably wont be too much of a performance benefit. I’m scouring the internet for links on this topic. If anyone has some additional information, that would be useful.

I guess emulating 1024 bit integer arithmetics via 64bit integers (after all we are talking about of factor 16) will give a performance hit. But the most painful will not be the increase number of ops, but the memory overhead as you will need to read 16x more data for one integer. If the code is already bandwidth limited, you most probably will get at least 16x slower performance. If the code is computation limited … do not know, but this sounds like something to experiment with.

Actually you don’t need to implement 1024-bit arithmetic to be able to do integer factorization (well, if you don’t use trial division for numbers of such size :) ).

I believe that most time consuming part of modern factorization algorithms — sieving — can be done on GPU with significant speedup, but it may require double precison support. ANyway, I’m not aware of any public information on this topic.

Sieving requires floating point operation, AFAIK. I’m not sure if single precision wil be good enough for it. Sorry, I don’t have very deep knowlegdge of NFS algorithms, I’m just sharing my thougts :)

In Mersenne Prime searches, such as GIMPS, they actually use an FFT to do multiplication of very large integers (billions of digits) efficiently. FFT-based large integer multiplication is important in this case because it has time complexity O(n log(n) log(log(n))), rather than O(n^2).

Using an FFT this way requires a certain level of floating point precision to ensure that round-off error doesn’t taint the final answer. GIMPS is getting close to the limit of double precision FFTs, and may have to switch to quad precision in order to multiply even bigger numbers.

Number field sieve methods may use similar tricks, but I don’t know anything about those.