Baby Step Giant Step Algoritm on CUDA

Hi all, i’m new to CUDA programming and GPU programming in general.

I wanted to try to implement the basic Shank’s Baby Step Giant Step algorithm to solve the discrete logarithm problem in a given finite field.
The problem is that I don’t know how I can implement the modular exponentiations exploiting the GPU parallelization.
In particular, if for i=0,1,2,…,M I need to solve g^i mod n, with n order of the given finite field, which is the best way to parallelize these operations? If I use M threads to compute g^i mod n in parallel, I won’t get benefits, since the total complexity is bounded to the last operation (g^M mod n) .
The best way is to compute g^i with g*g^(i-1), but in this way I get sequential operations, not parallelized (and the complexity time will be O(M)).

And then, which is the best way to actually compute a modular exponentiation on CUDA?

Thank you very much.


I have no experience on this but I think it is the same as prefix sum,
of course, it is not summation but multiplication.
prefix mul(g g g g) = (g g^2 g^3 g^4)

Do you mean an algorithm like that ?

Substituting the sum operator with the modular product?

Ops sorry i link you the wrong page. I meant that one

yes, source code of prefix sum is also available in SDK.

You can check if it is what you want.

Hi, finally I found a library usefule to manipulate big numbers for cryptographic purposes, it’s called tommath library.

I would like to use the functions defined in it to make my algorithm parallel and efficient, but when I use these functions in another global function, the compiler tells me that it’s not allowed to call a host function from a global/device function.

Is it possible to use this libraries in some way, exploiting the GPU power?

Thank you very much.