Does anybody had any success in implementing bit slice DES?
This is well-known method to speed up DES encryption routine:
I did almost the same as in original source code.
But, registers count is huge:
[codebox]>ptxas info : Used 60 registers, 2160+0 bytes lmem, 28+16 bytes smem[/codebox]
So, 2000 registers shifted to local memory and it (9500 GT board) works roughly as fast as Intel Core Duo. Probably because of constant global memory access.
So what can be done?