I’m dealing with the problem of accurately calculating the modified Bessel function of zero-th order I0 in CUDA.

For a long time, I have been using a rational Chebyshev approximation according to the paper

J.M. Blair, “Rational Chebyshev approximations for the modified Bessel functions I_0(x) and I_1(x)”, Math. Comput., vol. 28, n. 126, pp. 581-583, Apr. 1974.

which, as compared to the result provided by Matlab, gives an average error of the order of 1e-29. Unfortunately, this seemingly high accuracy is not anymore enough for a new application I’m working on.

Matlab uses the Fortran routines developed by D.E. Amos

Amos, D.E., “A Subroutine Package for Bessel Functions of a Complex Argument and Nonnegative Order,” Sandia National Laboratory Report, SAND85-1018, May, 1985.

Amos, D.E., “A Portable Package for Bessel Functions of a Complex Argument and Nonnegative Order,” Trans. Math. Software, 1986.

which are available for download from the netlib/amos website.

There are ways to use those Fortran routines in a C/C++ code by compiling them in a library file and then using a C/C++ wrapper (see for example netlib_wrapping). I’m wondering if there is any means to make device functions out of those Fortran routines to be then called by a CUDA kernel).