CryptAnalysis on CUDA

Hello everyone,

I am thinking of implementing few of the cryptography hashing algorithm on CUDA, such as MD5 ,SHA-256. Is it possible to implement on CUDA so I can exploit parallel processing of graphics card.


Yes, I believe that other people have already done some of them (though they are probably included in proprietary products…so no source code available). Off the top of my head, I would think the best way is to decide on a standard input buffer size, then make an array of those buffers in device memory and upload your data to the device, then run a single thread for each piece of data in memory. I don’t know how much (if any) of the actual hashing algorithms can be parallelized…they are generally serial in nature.

Well thanks for reply.

I knew about hashing working in serial and making it to implement as threads. Can you suggest any cryptography algorithms which I can compare and there is very little work done on it.

Well, if you’re just looking to write a program to learn CUDA and cryptography methods…how about trying to write a hash cracker? MD5, SHA1, and SHA2 (for the 224-,256-,384, and 512-bit varieties) should be enough to get you started. Basically, you’d run the kernel repeatedly with various input data to build a lookup table or rainbow table, and then you could try to use the lookup table to find the plaintext for a hash. If you got it working, and it was just for fun, maybe you could post your code up here for others to see.

If you do try the hashing functions, it would probably be a good idea to store the initialization values (‘s-box’ in some algorithms) in constant memory so that all of the threads have fast access to them (since each thread will need the values to compute it’s output value).

Various hash functions are certainly possible to implement in GPU, but the whole idea of putting hashing into GPU is somewhat questionable.
Traditional hash functions such as MD4, MD5, SHA-1, SHA-2, RIPEMD, etc. operate on stream of data in serial manner, i.e. to calculate hash of 1 MB message you first split message into smaller blocks (typically 64 bytes) and then process those blocks in order. You cannot process blocks out of order or in parallel, because processing block (i+1) requires result of processing block (i). Besides, modern hashing algorithms are pretty fast and calculating it is not the bottleneck in most applications (usually it is disk read speed).

One good example when application can benefit from hashing on GPU is indeed a hash cracking, as profquail suggested. There are some open-source project which can be used as a starting point, I’d suggest looking at very basic MD5 cracker of my own: Also, take a look at pyrit, which is open-source WPA password cracker (which is basically iterated SHA-1).

BTW, the good news are that most (if not all) of SHA-3 candidate algorithms are designed to take benefit of massively parallel architectures (i.e. it is possible to compute hash of big message in parallel).