(Warning: Maxwell only!)
All codes are now open-sourced here as part of my personal fork of Meriken’s Tripcode Engine, a third-party tripcode finder for 10 character (DES crypt(3)) and 12 character (SHA-1) tripcodes.
Grab the following files if you are interested in:
UNIX DES crypt(3) with salts:
- DES_Crypt_##.h in /MerikensTripcodeEngine/Source Files/out/, where ## ranges from 0 to 63. The 64 files contain 4096 functions DES_Crypt_###(), where ### ranges from 0 to 4095 and corresponds with the 12-bit salt value used to perturb the DES expansion function.
- SBOXes: sbox.h in /MerikensTripcodeEngine/Source Files/
- Key-swapping macros: keyswap.h in /MerikensTripcodeEngine/Source Files/
One round of DES:
- DES_Crypt_0() in DES_Crypt_0.h in /MerikensTripcodeEngine/Source Files/out/, and manually remove the outer loop and the DATASWAP lines.
- SBOXes: sbox.h in /MerikensTripcodeEngine/Source Files/
- Key-swapping macros: keyswap.h in /MerikensTripcodeEngine/Source Files/
Note: initial permutation and final permutation are not included in the function; you need to do it yourself. Normally they’re part of the pre-processing. Both have minimal performance impact.
Suggested register usage or launch bound is 168 regs, or__launch_bounds__(128, 3), even when you wrap things around it.
With this version, I get a performance of 950 MH/s for UNIX DES crypt(3) (or equivalently 23750 MH/s for 1 round of DES) on my reference Gigabyte GTX 980 Ti (+270 MHz). Considering hashcat’s implementation gets 165.5 MH/s on a GTX Titan X (+225 MHz), it’s a great improvement. Even my naive implementation bounded by shared memory/synchronization with old SBOXes from JtR is faster (300 MH/s on 980 Ti +300 MHz).
This further dispelled the myth that Nvidia cards are still bad at DES crypt(3); while it may hold some truth before Maxwell (or even sm_32, since it’s the first version that lift the register limit from 63 to 255) when compared against GCN, it’s no longer the case, and it all depends on the implementation.