I need to find complex roots (to be precise just the negative imaginary part) of polynomial 5th order. The problem is I have plenty polynomials (about 10^7). I’am using Bairstow’s method on CPU and the time to find roots is really long. I’ve heard about CUDA and I was wondering if it is possible to pass one polynomial to each thread and then each thread finds and saves imaginary part of root to an array? Is it worth to use GPU instead of CPU in this problem?
Looking at the Wikipedia description of the algorithm, I don’t see any reason why your approach shouldn’t work. Will it be faster than the corresponding CPU code? Impossible to say without looking at many specifics (such as: what CPU, what GPU, single-precision or double-precision computation, etc). It seems to be a small enough problem that it would be best to just give it try.
Well, it works but the driver is crashing if I use to many threads. Should I divide my problem and call kernel few times to calculate all polynomials or there’s solution for the driver crash?
P.S Is theres any function in CUDA C that works same as rand(); ?
The driver crash when you increase the threads is covered here if you’re on windows:
CUDA C doesn’t have a built-in function like rand(), but you can use the CURAND library, or you can simply generate your random numbers on the host and then copy them to the device, if you don’t care about performance.
If the kernel run takes over a certain amount (2 seconds in Windows, not sure the limit for linux) it will cause a ‘timeout’. You can change the OS setting to turn that off, or adjust the maximum time for the OS to wait for a device kernel to run.
EDIT: txbob’s answer is more correct than mine…