WPA2-PSK implemented in CUDA

Just an update for anyone interested, I have tested the latest version ‘rev46’ and it successfully compiled and ran on a 8800GTS.

The Pyrit commandline-client © 2008 Lukas Lueg http://pyrit.googlecode.com
This code is distributed under the GNU General Public License v3

Available cores: ‘Standard CPU’, ‘Nvidia CUDA’

Testing CPU-only core ‘Standard CPU’ (4 CPUs)…
10000 PMKs in 11.00 seconds: 909.38 PMKs/s
Result hash: OK

Testing GPU core ‘Nvidia CUDA’ (Device ‘GeForce 8800 GTS’)
10000 PMKs in 3.42 seconds: 2922.08 PMKs/s
Result hash: OK

nice work ebfe

Good to hear; the hanging should finally be fixed…

In rev47 I’ve updated the benchmarking code to give more realistic results. As the cuda-kernel must not run longer than 5 seconds, the code will have to calibrate itself to your card’s speed. This is done in the first few seconds of usage and in rev47 the benchmarking code will first “burn in” before taking the results for granted

I just tried out rev 47 on both my GTX 280 and my 8800 GTX. The GTX 280 is in a 2.6 GHz Phenom 9950 quad core:

Testing CPU-only core 'Standard CPU' (4 CPUs)... 942.61 PMKs/s

Testing GPU core 'Nvidia CUDA' (Device 'GeForce GTX 280')... 8915.46 PMKs/s

The 8800 GTX is in a 2.2 GHz Phenom 9500 quad core:

Testing CPU-only core 'Standard CPU' (4 CPUs)... 811.90 PMKs/s

Testing GPU core 'Nvidia CUDA' (Device 'GeForce 8800 GTX')... 6480.91 PMKs/s

That’s really impressive!

Please try importing a large password set and running the batch on the GTX280. The speed should be much higher than 8.900 rounds per seconds.

For that to work each workunit should be at least 50.000 passwords in size;

How do I do that?

Delete your current password blobspace (the password directory in the blobspace directory) to have a fresh start. Then you can fill it with 2.000.000 random passwords like this:

python -c "import random, string; print '\n'.join((''.join(random.sample(string.letters, 10)) for i in xrange(2000000)))" | ./pyrit_cli.py -f - import_passwords

Each workunit should be 2.000.000 * 12 / 256 ~= 93.000 passwords in size then. Then run a batchprocess.

Note to other people reading this thread who might be also trying this out: You need to create a new ESSID:

./pyrit_cli.py -e foo create_essid

Then when you run

./pyrit_cli.py batch

something will actually happen. :)

Anyway, I’ve left the GTX 280 chewing on this for 15 minutes, and the rate has stabilized around:

Working on unit '0b79dfb38161092284336c6b5e93dc07' (117/256), 93612 PMKs to do.

  -> All done. (16.24 mins, 11344.42 PMK/sec, 185867004.95 SHA1/sec). eft).

This seems more like it. Even higher performance should be possible (somewhere around 15.000 on a gtx280) but Pyrit doesn’t scale that good yet…

I’ve updated the performance graphics on http://pyrit.googlecode.com :-)

I’d also be curious to see how one-half of a 9800 GX2 performs on this code. Four 9800 GX2 cards (8 CUDA devices) in one computer is probably still the most powerful CUDA workstation that can be built today.

Very roughly extrapolating from the 8800 GTX point (scaling by relative memory bandwidth), I’d expect this computer:

http://fastra.ua.ac.be/en/index.html

could hit 40000 PMKs per second with a multi-threaded CUDA version of pyrit, and only cost about $4k.

Hi!

your project looks really cool!

  1. Did you use Pycuda? how was it?

  2. If i were to use this program to show my professors the power of cuda, e.g. trying to crack a known password generated with GPG, should i be worried about hash collisions?

Pyrit does not use PyCuda. I did take a look at it and it seems like a great library; however it’s use is more targeted at scientific/numerical problems.

Hash collisions will not occur. The output that Pyrit generates are 32 bytes (160 + 96 bits) hashes which in turn are used as keys for further encryption in the authentication phase. It is unbelievable unlikely that a collision occurs between two sets of two SHA-1 values.

I’ve posted some more general info on CUDA’s effect to WPA-PSK to [url=“The twilight of Wi-Fi Protected Access | Pyrit”]http://pyrit.wordpress.com/the-twilight-of...otected-access/[/url]

FYI :)

Is this your project, or another (perhaps over-dramatized) effort?
[url=“SC Media UK”]SC Media UK

This is my project and it’s distributed free of charge under an open-source license; you are free to inspect or modify it.

With a lot of buzzword-trumpet-sound, Elcomsoft re-announced their commercial tool the day before yesterday by an ad on scmagazineuk.com. Their tool is completely unrelated to mine and their primary goal is to make money out of it.

This article is really about your project? Or am I missing something?

This is not true. AFAIK, we do not have any ads on scmagazineuk.com nor have we paid for that article (maybe GSS did). I think you should check the facts before posting on public forum.

On October, 10 we have released new version of EDPR which now supports GPU acceleration of WPA/WPA2 handshakes and hashes. Over-dramatization of our press releases is not something new for us, but this is really the thing we have little control over. If you’re interested, please check original document that was distributed: Link.

ebfe, I can assure you that this release is not related to your project and it doesn’t contain even single line of your source code. In fact, the code was written long before your project even existed.

I was referring to Pyrit as being my project. As I said there is no connection between EDPR and Pyrit and they do not share a common codebase.

i havent tried your program…im in vista 32bit right now…later ill boot into ubuntu 8.04 give it a try.

what i was really wondering about is multi-gpu capability.

do you plan on modifying the code to support more then 1 gpu at a time?
i have noticed this quite a bit where the coder will only make it for a single gpu enviroment when the fact is many people that want to use cuda have more then 1 gpu in their case.

why are there so many coders only making program examples with only 1 gpu?

i have 2 8800gts g92s and i think it would compete with the gtx280…maybe.

Well, it is a lot more challenging to program for 2 or more GPUs, not all people have multi-GPU systems to develop on, and some algorithms just don’t do well on multiple GPUs due to the need for significant PCIe transfers slowing things down.

to my understanding alot of people do have multiple gpus.

to my understanding alot or programs scale very well (i have read they usually add around 99%) per gpu of the same type that is added)

now i agree with you when you say its probably alot easier to program for one gpu.)

what surprises me is more people are not trying to learn to implement multi-gpu in their code as often as they can) an example of this is with badaboomit anf\d tmpeg 4 with gpu accel plus all the examples in these forums.

i think the only real multi-gpu code i have really seen was in the nvidia sdk simplemultigpu and that is really about it.

i think it would be good practice to have the start of the program to look for the number of gpu`s and then the program can adjust accordingly instead of only programming for single gpu and maybe coming back to it later and adding gpu support.

now im new to programming…just started reading K&R 2nd edition for C.

it just seams logical to have the code adjust to how many gpu`s it detects…i mean even if the code runs better on a single gpu it would be good practice to start adding in the multi-gpu support from the begining.

Implementing Multi-GPU in Pyrit is definitely on my agenda. However the only box with CUDA-capable hardware is a MacBook Pro of mine.

I’ve one could provide a shell account (root not required) on an always-on/Linux-driven/Multi-GPU system I may do that within a few days. It’s actually pretty easy.