Cryptography in CUDA DES/AES/RSA

Guys, does anybody have implementation by CUDA of one of this cipher ? I need it very much…

I think Fermi is the first architecture to have 32-bit integer support. I think it will take a while for someone to write a library for cryptography

I think Fermi is the first architecture to have 32-bit integer support. I think it will take a while for someone to write a library for cryptography

It’s not, 32-bit integers are supported since the beginning.

It’s not, 32-bit integers are supported since the beginning.

I think the previous poster was referring to the fact that the hardware only supported 24-bit multiply for integers before Fermi (32-bit integer multiply compiled to multiple instructions).

I think the previous poster was referring to the fact that the hardware only supported 24-bit multiply for integers before Fermi (32-bit integer multiply compiled to multiple instructions).

Do we have a big integer library yet?

The hard part is not writing the code; the hard part of running crypto on GPUs is finding a way to get the software everyone uses to throw huge piles of independent crypto ops at the graphics hardware. Nobody’s crypto library is designed to do that.

i.e. suppose the folks in this forum work their magic and build code that does RSA signatures on a Fermi card 10x faster than the PC it’s plugged into, but only for batches of 10000 RSA operations or more. Where are you going to get 10000 RSA operations to do, at the same time, out of a crypto library like openSSL?

Exactly. I cringe every time I see someone want to do H.264 decoding or AES with CUDA. Sure, it can be faster, but not a lot and power consumption wise, it’s completely uncompetitive. H.264 decoding & AES are best handled by ASICs (maybe SSE4).

Agreed. Take a look at this paper. GPUs could be ~10x more efficient due to wide SIMD. But ASICs are 500x more efficient due to overheads associated with actually having RISC-like instructions, register files, and simple functional units.

http://www.stanford.edu/~bcclee/documents/…10-isca-opt.pdf

For the random hackers, speed is way more important than power consumption. If it does offer 10x performance and that performance is going to widen in the future generation of GPUs, then it is worth writing code for. Of course for the deep pocket national defense folks, dedicated hardware is even better.

That’s misguided. According to this AES ASIC, it can encrypt at 40Gbits/s for the cost of only 31k gates. I’d say the power is easily under 1 watt. A GPU is a monstrosity @ 150W and the max AES throughput bound would be 400Gbits/s, based on max memory bandwidth of 100GiB/s. So power saving is at least 15x.

The main problem with GPU AES is like jasonp said, you need massive amounts of data to get max speedup and for real world applications, the data is streamed, not available all at once, so you can either wait until enough data arrives at GPU (complete performance loss) or come up with a streaming implementation (seems very hard).

GPU Gems 3, Chapter 36, Page 785. AES. By Takeshi Yamanouchi (SEGA). At the end they say that they plan to do it in CUDA maybe you can follow their work unless it is private. Also it would be a nice project to translate it to CUDA if it is not public available in a Paper or Code Library.

Best,

Alexander.

Thanks for your reply. But how much does this AES ASIC cost??? Where can I buy it???

Re: AES ASICs… a cheap and low-power VIA x86 CPU can throw up some impressive performance numbers: VIA PadLockâ„¢ Security Engine

The use case that seems to be looked at a lot for the VIA is SSL and storage encryption. They’re seeing 60-120x speedups in some benchmarks.

The VIA docs describe the dedicated hardware units that make this happen.

These numbers should give you a decent price/performance/power target if that’s even important to the original question poster.

Seems like a good stuff that is worth to spend some time to explore. Thanks

I think GPUs are probably better suited for floating point calculations than the integer intensive cryptographic applications.

Given that the competition to create AES required that the algorithm be implementable on a smart card, it’s probably safe to say that the ASIC implementations will destroy a GPU on any reasonable metric.