Addition of C++ functions comp_ellint_1, comp_ellint_2, comp_ellint_3 for GPU execution.

Hello,

I am working on a problem in fMRI and it is essential to get a faster version of these functions. They need to be executed on the order of 1E+12 times in my current computations and I need to repeat that computation several 100 times. Thta is impossible with the current computation time.

Best Joerg