Hi all, i’m new to CUDA programming and GPU programming in general.

I wanted to try to implement the basic Shank’s Baby Step Giant Step algorithm to solve the discrete logarithm problem in a given finite field.

The problem is that I don’t know how I can implement the modular exponentiations exploiting the GPU parallelization.

In particular, if for i=0,1,2,…,M I need to solve g^i mod n, with n order of the given finite field, which is the best way to parallelize these operations? If I use M threads to compute g^i mod n in parallel, I won’t get benefits, since the total complexity is bounded to the last operation (g^M mod n) .

The best way is to compute g^i with g*g^(i-1), but in this way I get sequential operations, not parallelized (and the complexity time will be O(M)).

And then, which is the best way to actually compute a modular exponentiation on CUDA?

Thank you very much.

Regards.