My first attempt at learning CUDA is a little unusual: accelerating one of the computationally intensive parts of the general number field sieve algorithm for factoring integers. Most of the interest is either in the sieving or the linear algebra stages, because those take the most time, but my initial work is on the first stage: polynomial selection.

Last year about this time, Thorsten Kleinjung described a new polynomial selection algorithm that improved noticeably on the state of the art (also invented by him), and is described here. I’ve implemented Kleinjung’s improved algorithm in the Msieve library (project page) and it turns out the core of the algorithm is very amenable to a GPU: small dataset, lots of arithmetic, regular memory access pattern, etc.

The good stuff is in branches/msieve-gpu/gnfs/poly/stage1 , currently under heavy development. I have CPU and GPU code that performs the same computations, and compared to a 1.86 GHz core2duo even an older 9800GT card performs 3.8x faster. And this in spite of the well-documented problems nvcc has with multiprecision arithmetic.

It’s really impressive how quickly an amateur like me can get up to speed on programming Nvidia’s cards, and this forum has a lot to do with that. Keep the great discussions going, I’m learning a bunch!