Hi all,
first of all, I apologise for any of my “english” mistakes.
I’m trying to better optimize a code which looks for the char string (i.e. “the password”) corresponding to a given SHA1 message digest by using a brute force search.
The code is organized as follows:
[codebox]int main(int argc, char* argv)
{
…
/* copy HostToDevice mem the “target” SHA1 msg digest and other parameters */
…
while(1)
{
…
RunBruteForce_SHA1hash( grid, result);
…
}
/******* device code ***********/
extern “C” RunBruteForce_SHA1hash(grid , result)
{
SHA1_DoBrute<<< grid, 256 >>>( result );
}
global SHA1_DoBrute( result )
{
…
/* brute force loop on all possible character combinations*/
for(i=0; i< password_length; i++){
.........
Block[i] = char_to_process;
/sha is a data strucure holding all variables and arrays needed by SHA1/
SHA1Input(&sha, &Block[i], 1);
}
SHA1_Result(&sha);
/* compare SHA1 hash generated by the string Block[password_length] with given SHA1 hash */
.......
}
device SHA1Input()
{
…
…
}
device otherSHA1functions()
{
…
…
}[/codebox]
Right now, my code has a low parallelism level, since each core calculates a different “pwd” to check and performs the whole (device ) Sha1 process to generate the corresponding hash. My question does not concern how to increase the parallelism level (which needs a parallel SHA1 algorithm), but “simply” how to make my code more performant (I already unrolled all the possible loops… ).
The variables needed by the SHA1 process flow are “intermediate”: they are created in the device memory and destroyed without ever being mapped by the host or copied to the host. I copied some “intermediate” variables to shared memory, since i understood is the good place to hold variables if you want a faster code.
For example, the array holding the message digest is an unsigned int msgDigest[5], in SMEM it must be an unsigned int msgDigest[blockDim][5]. This array is used in almost all the device SHA1 functions.
I define
[codebox]shared unsigned msgDigest[blockDim][5];
global SHA1_DoBrute( result ){
…
/msgDigest initialization per thread in the block/
msgDigest[threadIdx.x][0] = 0x67452301;
msgDigest[threadIdx.x][1] = 0xEFCDAB89;
etc...
for(i=0; i< password_length; i++){
.........
SHA1Input(&sha, &Block[i], 1);
}
…
}[/codebox]
I’m sorry if my question could sound silly, but do you think it makes sense? i run this code, and the result is NO IMPROVEMENT at all.
Do you have any suggestions?
Thank you all in advance