Can GPU programming solve the following problem of finding solutions to an equation?

I need to find integer solutions to the equation a^a + b^b = c^c. I could resort to traditional programming in, say, the Python programming language, where I would use three for loops (followed by some basic mathematical operations) and check to see whether the sum on the left of the equation is exactly equal to the value on the right. Assuming I iterate each of the three loops from 0 to 1000, the program will need to check through a little over 1 billion possible integer combinations (calculated by 1001^3). Due to the symmetry of a^a + b^b, a better optimization would be to only work with scenarios where a<=b, in which case the program will need to check through a little over 502 million possible integer combinations. I can optimize further. Take the sum on the left and attempt to resolve it to c^c. Python’s gmpy2 library will easily do that for us; just iterate the exponent from 2 to log base 2 of the sum (it’s a trick from mathematics). See this example:

import math
import gmpy2
gmpy2.get_context().precision=1000 #The higher the better, but slower
sum = 285311670611 #Will be obtained from a^a + b^b. This is just an example
for c in range(2, round(math.log(sum, 2))+2):
	result = gmpy2.root(sum, c) #This is where the magic happens!
	if(result%1==0 and result==c):
		print(result)

This is great and all but the number of integer combinations increases as the range of a,b increases. So, should I (can I) resort to GPU programming to try and quickly find solutions to the equation? If you have any questions, ask away and I will gladly answer them. Thank you.

Even with your “optimizations” this is still a brute-force space search. Yes, GPUs can do that. When I offer up such things people often frown at it. It’s usually better to try to use an analytic method, from a performance perspective. But if you are out of further “optimization” ideas, brute-force searching is certainly possible, and GPUs can often do it a lot quicker than CPUs. The brute-force search problem is generally naturally “embarassingly parallel” which makes it (usually) easy to write code for.

Here is an example. Your kernel code might be simpler than what is presented there. To employ your gmpy2.root() method, you would have to come up with an equivalent realization in some flavor of CUDA. That library/method will not be directly usable in CUDA device code, regardless of flavor, I don’t think.

That was a great response. If the method gmpy2.root() won’t be directly usable in device code, looks like it will slow me down, won’t it? Might as well do away with it and try to come up with another approach, am I right?

I think the effort you put into analytic solutions and/or “optimizations” is definitely the best approach. I don’t know exactly what gmpy2.root does, but if it is a significant optimization (and you had no other ideas) I would try to implement it in regular python or C code, so that I could create a kernel version of it.

I wouldn’t be quick to discard ideas/learning.

gmpy2.root() is pretty neat. This is what it does: Given an integer N, express it as A^b if possible. It is 100% accurate and works with surprisingly large integers as long as they don’t exceed approximately 1.7e308.

I guess it is evident that you want to search well beyond the 64-bit integer space. In that case you would probably want to locate a multi-precision library that is usable in GPU code.

Here is one possibility. That would be a possible useful library, but I’m not suggesting it has an implementation of gmpy.root() in it.

In common parlance, gmpy2.root (a, n) computes the n-th root of a. According to the documentation for the integer variant iroot, it returns a numerical result as well as an indication whether the result is exact. So if the c-th root of sum is c, and this result is exact, then cc = sum.

I am not aware of a CUDA-accelerated package that computes the n-th root for arbitrary precision integers, but I certainly do not claim encyclopedic knowledge of all CUDA-accelerated software.

I am surprised by the search loop. It seems to me the search region could be defined a lot tighter. Mathematically, c can be expressed using the Lambert W function as c = 2lambertW (log (sum)) / log (2). For example, for sum = 285311670611 this returns 11. I chose to calculate via base 2 here as that may make it easier to set up the end points of the search interval for arbitrary-precision binary integers; ordinarily one would go through base e, i.e, c = exp (lambertW (log (sum)).

For arbitrary-precision integers, log (sum) could be very roughly approximated via bitLength (sum) if need be, and the Boost library would be one place to find an implementation of the Lambert W function. For something more Pythonic, take a look at scipy.special.lambertw. This approach should drastically narrow down possible candidates for c, at which point there might not be enough parallelism left to provide meaningful acceleration via GPU computing.

Thank you for the link. I will be sure to check out the library and test its usefulness.

I have also never come across a CUDA-accelerated package that computes the n-th root for arbitrary precision integers, so you might just be right. I have never heard of the LambertW function. That calls for extensive reading. Thank you for your reply!

In the past two decades, people have found very many interesting uses for the Lambert W function, in addition to number-theoretic uses particularly also in the physical sciences.

The remarkable thing about this is that although the math behind it has been around for about 250 years, it was not recognized as an important function in its own right until the 1990s when it first appeared as a proper, named function in Maple. Since this happened after my university years, I did not come across this function until 2005 or so.

1 Like

I came across this function about 24 hours ago. Learning at the feet of the guru here.