AndreiB: Yes, I became aware of Elcomsoft’s work halfway through coding mine. But I finished it out of academic curiosity & for tinkering value (I couldn’t any actual CUDA MD5 sourcecode that I could compile and play with myself).
seibert: My GTS is the old one (it’s actually a QuadroFX 4600).
PS: I made performance vs. threads per block plots for 8800 GTS 512 (a G92 card; data courtesy of Ales Koval). I’m surprised by substantial the change in ‘calc’ test results, compared to the old 8800 GTS (and little difference in the ‘search’ test). Given that the main difference between the two modes is reading/writing to global memory, the G92s seem to be much better at memory access management than the old G80ies.