I’ve been playing big integers on my 8800GTX, with satisfying results. I can get 13k 512-bit Montgomery exponentiation operations per second, which is sufficient to do 6.5k private key decrypts (essentially SSL server handshakes) per second.
Has anyone else tried implementing big integer arithmetic? A big problem with CUDA as it stands is that PTX assembler doesn’t provide an add-with-carry instruction. I read somewhere that this functionality exists in the NV8 hardware, but just isn’t exposed; anyone know if this is true?