cant understand this performance hit

Any chance you try what happens if you use a constant number of iterations as I suggested?

hi Tera,

I just did it. It does run fast but I have two problems:

1)Using float when computing r(rstar) I can only get it with 10e-4 precision when I need at least 10e-6.

  1. Even with 10e-4, for some values of rstar, the number of iterations is large (slow convergence) while for others it will be small (fast convergence). So setting a fixed number of iterations will probably not work for all r(rstar).

What I will try is probably to compute all the r(rstar) values before I call the kernel (using the gpu). Store them in an array and then pass that array to the kernel. Will see if this improves the speed.

Thanks

deb

Hi Deb,

thanks for confirming that the problem is due to slow convergence (at least for some arguments).
Of course this was just meant to pinpoint the problem, not proposed as a solution (at least on its own).

For a solution, look at my optimized routines and how they use mixing of previous iterations to accelerate convergence.

Regarding single vs. double precision, you can iterate in single precision until you reach 10e-4 accuracy and then switch to double precision until you get the required accuracy. In my example I just added one final iteration in double precision but of course you can easily use more.

Thanks much Tera. It is quite neat what you suggest of switching to double precision once I reach 10e-4. I hadnt thought about it.

deb